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Introduction 


This document highlights the BIOS and software modifications 
required to fully support the K86™ family of processors, which 
includes the AMD-K5™ processor and the AMD-K6™ MMX 
processor. 


There can be more than one way to implement the functionality 
detailed in this document, and the information provided is for 
demonstration purposes. 


It is assumed that the reader possesses the proper knowledge of 
the K86 processors, the x86 architecture, and programming 
requirements to understand the information presented in this 
document. 
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CPU Identification 
Algorithms 


The CPUID instruction provides complete information about 
the processor (vendor, type, name, etc.) and its capabilities 
(features). After detecting the processor and its capabilities, 
software can be accurately tuned to the system for maximum 


' performance and benefit to users. For example, game software 


can test the performance level available from a particular 
processor by detecting the type or speed of the processor. If the 
performance level is high enough, the software can enable 
additional capabilities or more advanced algorithms. Another 
example involves testing for the presence of multimedia 
extensions (MMX) on the processor. If the software finds this 
feature present when it checks the feature bits, it can utilize 
these more powerful extensions for better performance on new 
multimedia software. 


For more detailed information refer to the AMD Processor 
Recognition Application Note, order# #20734, located at http:// 
www.amd.com 


Tables 1 and 2 outline the family codes and model codes for the 
AMD K86 processors. Table 1 shows the CPU speed, the 
‘P-Rating’, and the recommended BIOS boot-string associated 
with each AMD-K5 processor. 


Table 2 shows the recommended BIOS boot-string for the 
AMD-K6 MMX processor. This recommended boot-string is 
‘AMD-K6/PR2-XXX’. The value for XXX is determined by 





CPU Identification Algorithms 
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calculating the core frequency of the processor. Use the Time 
Stamp Counter (TSC) to ‘clock’ a timed operation and compare 
the result to the Real Time Clock (RTC) to determine the 
operating frequency. 


Note: Tables 1 and 2 contain information intended to prepare the 
infrastructure for potential future products. These products 
may or may not be announced, but BIOS software should be 
prepared to support these options. 


Table 1. Summary of AMD-K5™ Processor CPU IDs and BIOS Boot Strings 


Family Code 


BIOS Boot-String Return Values 


MKS PRT 


Instruction CPU Speed Recommended | CPUID Functions 8000_0002, 3, 4 
Code 


(AMO-KS Proceso 


AMD-K5-PR200 =| AMD-K5(tm) Processor 





Table 2. Summary of AMD-K6™ MMX Processor CPU IDs and BIOS Boot Strings 


CPU | CPU Bus 


Instruction Speed | Speed Recommended BIOS Boot-String Display 


Family Code 


5 | TBD | 60 | AMD-K6/PR2-XXX 
(AMD-K6 MMX Processor) | TBD | 66 —_ | AMD-K6/PR2-XXX | 
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AMD-K5™ 


Processor 





The AMD-K5 processor is socket 7-compatible and 
software-compatible with Pentium. Compatible in this sense 
means the devices are pin-for-pin compatible and that the same 
software can be executed on both processors with no software 
modifications. 


The BIOS for the AMD-KS processor requires minimal changes 
to fully support the AMD-KS5 processor family. 


BIOS Consideration Checklist 


CPUID 


m Use the CPUID instruction to properly identify the AMD-K5 
processor. 


m Determine the processor type, stepping and features using 
functions 0000_0001h and 8000_0001h of the CPUID 
instruction. 

= Boot-up display: The processor name is retrieved using 
CPUID extended functions 8000_0002h, 8000_0003h, and 
8000_0004h. See “CPU Identification Algorithms” on page 3 
for more information. 


AMD-K5™ Processor 5 


AMDd\1 


Preliminary Information 


AMD K86™ Family BIOS and Software Tools Developers Guide 21062C/0—March 1997 


CPU Speed Detection 


Use speed detection algorithms that do not rely on 
repetitive instruction sequences. 


Use the Time Stamp Counter (TSC) to ‘clock’ a timed 
operation and compare the result to the Real Time Clock 
(RTC) to determine the operating frequency. See the 
example of frequency-determination assembler code 
available on the AMD website at http://www.amd.com. 


Display the P-Rating shown in Tablel1, “Summary of 
AMD-K5™ Processor CPU IDs and BIOS Boot Strings,” on 
page 4. 


Model-Specific Registers (MSRs) 


Cache Testing 
SMM Issues 
6 


Access only MSRs implemented in the AMD-KS processor. 


Program the write allocate registers—Hardware 
Configuration Register (HWCR), Write Allocate 
Top-of-Memory and Control Register (WATMCR), and Write 
Allocate Programmable Memory’ Range _ Register 
(WAPMRR). See “Write Allocate Registers” on page 82 and 
the Implementation of Write Allocate in the K86™ Processors 
Application Note, order# 21326 for more information. 


Perform cache testing on the AMD-KS5 processor using the 
Array Access Register MSR. See “Array Access Register 
(AAR)” on page 28 for more information. 


The System Management Mode (SMM) functionality of the 
AMD-KS processor is identical to Pentium. 


Implement the AMD-K5 processor SMM state-save area in 
the same manner as Pentium except for the IDT Base and 
possibly Pentium-reserved areas. See “AMD-K5™ Processor 
System Management Mode (SMM)” on page 7 for more 
information. 


AMD-K5™ Processor 
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AMD-K5™ Processor System Management Mode (SMM) 


System Management Mode (SMM) is an alternate operating 
mode entered by way of a system management interrupt (SMI) 
and handled by an interrupt service routine. SMM is designed 
for system control activities such as power management. These 
activities appear transparent to conventional operating 
systems like DOS and Windows. SMM is primarily targeted for 
use by the Basic Input Output System (BIOS) and specialized 
low-level device drivers. The code and data for SMM are stored 
in the SMM memory area, which is isolated from main memory. 


The processor enters SMM by the system logic’s assertion of the 
SMT interrupt and the processor’s acknowledgment by the 
assertion of SMIACT. At this point the processor saves its state 
into the SMM memory state-save area and jumps to the SMM 
service routine. The processor returns from SMM when it 
executes the RSM (resume) instruction from within the SMM 
service routine. Subsequently, the processor restores its state 
from the SMM save area, de-asserts SMIACT, and resumes 
execution with the instruction following the point where it 
entered SMM. 


The following sections summarize the SMM state-save area, 
entry into and exit from SMM, exceptions and interrupts in 
SMM, memory allocation and addressing in SMM, and the SMI 
and SMIACT signals. 


Operating Mode and Default Register Values 


AMD-K5™ Processor 


The software environment within SMM has the following 
characteristics: 

= Addressing and operation in Real mode 

m 4-Gbyte segment limits 


= Default 16-bit operand, address, and stack sizes, although 
instruction prefixes can override these defaults 


= Control transfers that do not override the default operand 
size truncate the EIP to 16 bits 


m Far jumps or calls cannot transfer control to a segment with 
a base address requiring more than 20 bits, as in Real mode 
segment-base addressing 
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= A20M is masked 
= Interrupt vectors use the Real-mode interrupt vector table 
» The IF flag in EFLAGS is cleared (INTR not recognized) 
= The TF flag in EFLAGS is cleared 
= The NMI and INIT interrupts are disabled 
= Debug register DR7 is cleared (debug traps disabled) 


Figure 1 shows the default map of the SMM memory area. It 
consists of a 64-Kbyte area, between 0003_0000h and 
0003_FFFFh, of which the top 32 Kbytes (0003_8000h to 
0003_FFFFh) must be populated with RAM. The default 
code-segment (CS) base address for the area—called the SMM 
base address—is at 0003_0000h. The top 512 bytes 
(0003_FE00h to 0003_FFFFh) contain a fill-down SMM 
state-save area. The default entry point for the SMM service 
routine is 0003_8000h. 


Fill Down 0003_FFFFh 


SMM 
State-Save 


Ar 
o 0003_FEOOh 


32-Kbyte 
Minimum RAM 


SMM 
Service Routine 


Service Routine Entry Point 0003_8000h 





SMM Base Address (CS) 0003_.0000h 


Figure 1. SMM Memory 
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SMM Initial Register Values 


Table 3 shows the initial state of registers when entering SMM. 





















Initial Contents 
Register 
a 

cs 3008 
1 Gb 
x Gbytes 
x Gbytes 
tGbye 
EFAGS 
ee 00 80008 
GDTR Unmodified 
IDTR Unmodified 

Unmodified 

Undefined 
SMM State-Save Area 


Table 3. Initial State of Registers in SMM 
+ Gbytes 
General-Purpose Registers Unmodified 
Bits 0, 2, 3, and 31 cleared (PE, EM, TS, and PG); remainder are unmodified 
LDTR Unmodified 





When the processor acknowledges an SMI interrupt by 
asserting SMIACT, it saves its state in the 512-byte SMM 
state-save area shown in Table 4. The save begins at the top of 
the SMM memory area (SMM Base Address + FFFFh) and fills 
down to SMM base address + FEOOh. 


Table 4 shows the offsets in the SMM state-save area relative to 
the SMM base address. The SMM service routine can alter any 
of the read and write values in the state-save area. The contents 
of any reserved locations in the state-save area are not 
necessarily the same between the AMD-KS5 processor and 
Pentium or 486 processors. 
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Table 4. SMM State-Save Area Map 


|___—Ofset (Hex) | Contents 
oOo es 
a ae 
a (ees 
eX 

R7 

R 

S 

S 

S 

S 

S 

S 








FFCC DR6 (FFFF_CFF3h) 





E 
E 
E 
E 
TSS 
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Table 4. SMM State-Save Area Map (continued) 


[offset ie [Contents 
I/O Restart ESI 
/O Restart ECX 
I/O Restart EDI 
I/O Trap Restart Slot 
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SMM Revision Identifier 


The SMM revision identifier at offset FEFCh in the SMM 
state-save area specifies the version of SMM and the extensions 
available on the processor. The SMM revision identifier fields, 
shown in Table 5, are as follows: 


Bits 31-18—reserved 


a Bit 17—SMM base address relocation (always 1 = enabled) 
= Bit 16—I/O trap restart (always 1 = enabled) 
m Bits 15—-O—SMM revision level = 0000 


Table 5. SMM Revision Identifier Fields 


Bits 31-18 


SMM Base Relocation I/O Trap Extension SMM Revision Level 


Note: The I/O trap restart and the SMM base address relocation 
functions are always enabled in the AMD-K5 processor and 
do not need to be specifically enabled. 





SMM Base Address 


During RESET, the processor sets the code-segment (CS) base 
address for the SMM memory area—the SMM base address— 
to its default, 0003_0000h. The SMM base address at offset 
FEF8h in the SMM state-save area can be changed by the SMM 
service routine to any address aligned to a 32-Kbyte boundary. 
(Locations not aligned to a 32-Kbyte boundary cause the 
processor to enter the Shutdown state when executing the RSM 
instruction. ) 


In some operating environments it may be desirable to relocate 
the 64-Kbyte SMM memory area to a high memory area to 
provide more low memory for legacy software. During system 
initialization, the base of the 64-Kbyte SMM memory area is 
relocated by the BIOS. To relocate the SMM base address, the 
system enters the SMM handler at the default address. This 
handler changes the SMM base address location in the SMM 
state-save area, copies the SMM handler to the new location, 
and exits SMM. 
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The next time SMM is entered, the processor saves its state at 
the new base address. This new address is used for every SMM 
until the SMM base address in the SMM state-save area Is 
changed or a hardware reset occurs. 


Auto Halt Restart Slot 


AMD-K5™ Processor 


During entry into SMM, the halt restart slot at offset FFO2h in 
the SMM state-save area indicates whether SMM was entered 
from the Halt state. Before returning from SMM, the halt 
restart slot can be written to by the SMM service routine to 
specify whether the return from SMM should take the 
processor back to the Halt state or to the instruction-execution 
state specified by the SMM state-save area. 


On entry into SMM, the halt restart slot is configured as 
follows: 


ms Bits 15—-1—Undefined 
= Bit O—Point of entry to SMM: 
1 = entered from Halt state 
0 = not entered from Halt state 


After entry into the SMI handler and before returning from 
SMM, the halt restart slot can be written using the following 
definition: 
m Bits 15-1—Undefined 
= Bit O—Point of return from SMM 

1 = return to Halt state 

0 = return to state specified by SMM state-save area 


If the return from SMM takes the processor back to the Halt 
state, the HLT instruction is not re-executed, but the Halt 
special bus cycle is driven on the bus after the return. 
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I/O Trap Dword 


If the assertion of SMI is recognized on the boundary of an I/O 
instruction, the I/O trap dword at offset FFA4h in the SMM 
state-save area contains information about the instruction. The 
fields of the I/O trap dword, shown in Table 6, are configured as 
follows: 


Bits 31-16—I/O port address 

Bit 15—I/O string operation (1 = string, 0 = non-string) 
Bits 14—2—reserved 

Bit 1—Valid I/O instruction (1 = valid, 0 = invalid) 

Bit O—Input or output instruction (1 = INx, 0 = OUTx) 


Table 6. 1/0 Trap Dword Fields 


/O Port Address | I/O String Operation Valid I/O Instruction | Input or Output 


The I/O trap dword is related to the I/O trap restart slot, 
described below. Bit 1 of the I/O trap dword (the valid bit) 
should be tested if the I/O trap restart slot is to be changed. 


I/O Trap Restart Slot 


The I/O trap restart slot at offset FFO0h in the SMM state-save 
area specifies whether the trapped J/O instruction should be 
re-executed on return from SMM. This slot in the state-save 
area is called the I/O instruction restart function. Re-executing 
a trapped IJ/O instruction is useful, for example, if an I/O write 
occurs to a disk that is powered down. The system logic 
monitoring such an access can assert SMI. Then the SMM 
service routine can query the system logic, detect a failed I/O 
write, take action to power-up the I/O device, enable the I/O 
trap restart slot feature, and return from SMM. 
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The fields of the I/O trap restart slot are defined as follows: 
= Bits 31-16—reserved 
= Bits 15—O—IO instruction restart on return from SMM: 


0000h = execute the next instruction after the trapped 
I/O instruction 


OOFFh = re-execute the trapped I/O instruction 


Table 7 shows the format of the I/O trap restart slot. 


Table 7. 1/0 Trap Restart Slot 


I/O Instruction restart on return from SMM: 


m 0000h = execute the next instruction after the trapped I/O 
instruction 


Reserved 


m OOFFh=re-execute the trapped I/O instruction 





The processor initializes the I/O trap restart slot to 0000h upon 
entry into SMM. If SMM is entered as a result of a trapped I/O 
instruction, the processor indicates the validity of the I/O 
instruction by setting or clearing bit 1 of the I/O trap dword at 
offset FFA4h in the SMM state-save area. The SMM service 
routine should test bit 1 of the I/O trap dword to determine if a 
valid I/O instruction was being executed when entering SMM 
and before writing the I/O trap restart slot. If the I/O instruction 
is valid, the SMM service routine can safely rewrite the I/O trap 
restart slot with the value O0OFFh, causing the processor to 
re-execute the trapped I/O instruction when the RSM 
instruction is executed. If the I/O instruction is invalid, writing 
the I/O trap restart slot has undefined results. 


If a second SMI is asserted and a valid I/O instruction was 
trapped by the first SMM handler, the CPU services the second 
SMI prior to re-executing the trapped I/O instruction. The 
second entry into SMM never has bit 1 of the I/O trap dword set, 
and the second SMM service routine must not rewrite the I/O 
trap restart slot. 


During a simultaneous SMI I/O instruction trap and debug 
breakpoint trap, the AMD-KS processor first responds to the 
SMI and postpones recognizing the debug exception until after 
returning from SMM via the RSM instruction. If the debug 
registers DR3—DRO are used while in SMM, they must be saved 
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and restored by the SMM handler. The processor automatically 
saves and restores DR7-DR6. If the I/O trap restart slot in the 
SMM « state-save area contains the value 00FFh when the RSM 
instruction is executed, the debug trap does not occur until 
after the I/O instruction is re-executed. 


Exceptions and Interrupts in SMM | 
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When SMM is entered, the processor disables both INTR and 
NMI interrupts. The processor disables INTR interrupts by 
clearing the IF flag in the EFLAGS register. To enable INTR 
interrupts within SMM, the SMM handler must set the IF flag to 
1. 


Generating an INTR interrupt is a method for unmasking NMI 
interrupts in SMM. The processor recognizes the assertion of 
NMI within SMM immediately after the completion of an IRET. 
The NMI can thus be enabled by using a dummy INTR 
interrupt. Once NMI is recognized within SMM, NMI 
recognition remains enabled until SMM is exited, at which 
point NMI masking is restored to the state it was in before 
entering SMM. 


Because the IF flag is cleared when entering SMM, the HLT 
instruction should not be executed in SMM without first setting 
the IF bit to 1. Setting this bit to 1 enables the processor to exit 
the Halt state by means of an INTR interrupt. 


Table 8 summarizes the behavior of all interrupts in SMM. 
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Entry to service routine 






















Sampling? Acknowledgment | Point of Interruptibility‘ 
other software 
[5 [RS interrupt [Tevet sensitive | none | PROY | Negation of PRD 
: ; Completion of 


INTn instruc- 
exceptions 0-255 
exceptions 
. FLUSH-Acknowl- BRDY of FLUSH 
4 interrupt | edge-triggered 
bus cycle 
sm 
NMI interrupts: IRET from 
7 edge-triggered’ 2 Service routine. All others: 
Entry to service routine. 


















Entry to service routine 


level-sensitive 


Table 8. Summary of Interrupts and Exceptions 
tions and all 
[2 [aUscHK ——Pnerupt [evelsensive [ie [none | Eytoseicerotine 
edge special Acknowledge bus cycle 
. Entry to SMM servic 
Interrupt acknowl- 
INTR 
bus cycle 











edge special 
STPCLK interrupt | level-sensitive atop urant Negation of STPCLK 
special bus cycle 


1. For interrupts with vectors, the processor saves its state prior to accessing the service routine and changing the program flow. 
Interrupts without vectors do not change program flow; instead, they simply pause program flow for the duration of the interrupt 
function and return to where they left off 

2. If the Machine Check Enable (MCE) bit in CR4 is set to 1. 

3. The entry point for the SMI interrupt handler is at offset 8000h from the SMM Base Address. 

4, ra the edge-triggered interrupts are latched when asserted. All interrupts are recognized at the next instruction retirement 

oundary. 

5. Ifa bus cycle ts in progress, EWBE must be asserted before the interrupt is recognized. 

6. For external interrupts (most exceptions, by contrast, are recognized when they occur). External interrupts are recognized at 
instruction boundaries. When MOV or POP instructions load SS, interruptibility is delayed until after the next instruction, thus 
allowing both SS and the corresponding SP to load. 


7. After assertion of SMI, subsequent assertions of SMI are masked to prevent recursive entry into SMM. However, other exceptions 
or interrupts (except INIT and NMI) are taken in the SMM service routine. 
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AMD-K5™ Processor RESET State 


The state of all architecture registers and Model-Specific 
Registers (MSRs) after the AMD-K5 processor has completed 
its initialization due to the recognition of the assertion of 
RESET are shown in Table 9. 


Table 9. State of the AMD-K5™ Processor After RESET 


neste | seSute | (Notes 
[eae | Base0000_Doo0Tnizooooh | 
fore [Baseaooo_oooomitooooh =| 
rk ooh Sd 
nat 
rel 
— 











rm 
er «REO 
feax______[oooo-ooooh Sd 
[ex_____foooo-ooooh 
[ex______foooo-ooooh 
[a 
[si fooon-ooooh Sd 
er ———fovoo.ooooh SY 
[esp fon0o-ooooh 
ee 
so aaa 
an na 
os eal 
js Ed 
is mane 
s— aaa 














0000_0000h 


1. . The contents of EAX indicate if BIST was successful. If EAX = 0000_0000h, then BIST 
was successful. If EAX is non-zero, BIST failed. 


2. EDX contains the AMD-K5 processor signature, which is comprised of the instruction 
family, model, and stepping. 

3. These MSRs are described in “AMD-K5™ Processor x86 Architecture Extensions” on 
page 57. 


4, The AMD-K5 processor supports write allocate only on Models 1, 2, and 3, with a 
Stepping of 4 or greater. 
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Table 9. State of the AMD-K5™ Processor After RESET (continued) 


pep seSte | Nts 
PU Sick R7-RO [0000000 0000-0000-0000h | 
[PU ConilWord [ooh SS*dSS 
FPU Status Word 0000h Po 


FPU Tag Word 5555h 

FPU Instruction Pointer | 0000_0000_0000h 
FPU Data Pointer 0000_0000_0000h 
FPU Opcode Register | 000_0000_0000b 
CRO 6000_0010h 

CR2 0000_0000h 

CR3 0000_0000h 

CR4 0000_0000h 

DR7 0000_0400h 


i 
ae 
La 
—- 
oo 
Lew 
Pd 
fen 
ee 
FFFF_OFFOh ae 
oe 
ee 
be 
oo 
| 
eed 
an 
fo 
a 

























DR3 0000_0000h 
0000_0000h 
0000_0000h 
0000_0000h 
0000_0000_0000_0000h 
CT 0000_0000_0000_0000h 
TR12 0000_0000_0000_0000h 


TSC 0000_0000_0000_0000h 


nak 
wer 
WaT 
WAP 


Notes: 
1. The contents of EAX indicate if BIST was successful. If EAX = 0000_0000h, then BIST 
was successful. If EAX is non-zero, BIST failed. 
2. EDX contains the AMD-K5 processor signature, which is comprised of the instruction 
family, model, and stepping. 
3. These MSRs are described in “AMD-K5™ Processor x86 Architecture Extensions” on 
page 57. 
4, The AMD-K5 processor supports write allocate only on Models 1, 2, and 3, with a 
Stepping of 4 or greater. 







MCAR 






= 
) 
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Segment Register Attributes 


The selector portion of all segment registers is cleared. The 
access rights and attribute fields are set up as shown in Table 
10. 


Table 10. Segment Register Attribute Fields Initial Values 


Application eemelt (except LDTR) 





The limit fields are set to FFFFh. For CS, the base address is set 
to FFFF_0000h; for all others the base address is 0. Note that 
IDTR and GDTR consist of the just base and limit values, which 
are initialized to 0 and FFFFh, respectively. 


State of the AMD-K5™ Processor After INIT 
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The assertion of INIT causes the processor to empty its 
pipelines, initialize most of its internal state, and branch to 
address FFFF_FFFOh—the same instruction execution starting 
point used after RESET. Unlike RESET, the processor 
preserves the contents of its caches, the floating-point state, the 
SMM base, MSRs, and the CD and NW bits of the CRO register. 


The edge-sensitive interrupts FLUSH and SMI are sampled and 
preserved during the INIT process and are handled accordingly 
after the initialization is complete. However, the processor 
resets any pending NMI interrupt upon sampling INIT asserted. 


INIT can be used as an accelerator for 80286 code that requires 
a reset to exit from Protected mode back to Real mode. 
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AMD-K5™ Processor Test and Debug 


AMD-K5™ Processor 


The AMD-KS5 processor has the following modes in which 
processor and system operation can be tested or debugged: 


Hardware Configuration Register (HWCR)—The HWCR is a 
MSR that contains configuration bits that enable cache, 
branch tracing, debug, and clock control 

functions. 


Built-In Self-Test (BIST) —Both normal and test access port 
(TAP) BIST. 


Output-Float Test—A test mode that causes the AMD-K5 
processor to float all of its output and bidirectional signals. 


Cache and TLB Testing—The Array Access Register (AAR) 
supports writes and reads to any location in the tag and data 
arrays of the processor’s on-chip caches and TLBs. 


Debug Registers—Standard 486 debug functions with an 
1/O-breakpoint extension. 


Branch Tracing—A pair of special bus cycles can be driven 
immediately after taken branches to specify information 
about the branch instruction and its target. The Hardware 
Configuration Register (HWCR) provides support for this 
and other debug functions. 


Functional Redundancy Checking—Support for real-time 
testing that uses two processors in a master-checker 
relationship. 


Test Access Port (TAP) Boundary-Scan Testing—The JTAG 
test access functions defined by the [EEE Standard Test 
Access Port and Boundary-Scan Architecture (IEEE 
1149.1-1990) specification. 


Hardware Debug Tool (HDT)—The hardware debug tool 
(HDT), sometimes referred to as the debug port or Probe 
mode, is a collection of signals, registers, and processor 
microcode enabled when external debug logic drives R/S 
Low or loads the AMD-K5 processor’s Test Access Port 
(TAP) instruction register with the USEHDT instruction. 
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The test-related signals are described in Chapter 5 of the 
AMD-K5™ Processor Technical Reference Manual, order# 18524. 
The signals include the following: 


=» FLUSH 
FRCMC 
TERR 
INIT 
PRDY 
R/S 
RESET 
TCK 
TDI 
TDO 
TMS 
TRST 


The sections that follow provide details on each of the test and 
debug features. 


Hardware Configuration Register (HWCR) 


The Hardware Configuration Register (HWCR) is a MSR that 
contains configuration bits that enable cache, branch tracing, 
write allocation, debug, and clock control functions. The 
WRMSR and RDMSR instructions access the HWCR when the 
ECX register contains the value 83h, as described on page 90. 
Figure 2 and Table 11 show the format and fields of the HWCR. 
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—> Reserved 
Symbol Description Bits 
DDC Disable Data Cache 7 
DIC Disable Instruction Cache 6 
DBP Disable Branch Prediction 5 
WA Write Allocate Enable 4 
DC Debug Control 3-] 
000 Off 
001 Enable branch trace usages 
DSPC Disable Stopping Processor Clocks 0 


Figure 2. Hardware Configuration Register (HWCR) 


Table 11. Hardware Configuration Register (HWCR) Fields 


reserved 


Disables data cache 
7 DDC Disable Data Cache 
0 = enabled, 1 = disabled 
; Disables instruction cache 
DIC Disable Instruction Cache 
0 = enabled, 1 = disabled 
ng Disables branch prediction 
5 DBP Disable Branch Prediction 
0 = enabled, 1 = disabled 
Enables write allocation 
WA* Enable Write Allocate 
0 = disabled, 1 = enabled 


Note: 
* — The AMD-KS5 processor supports write allocate only on Models 1, 2, and 3, with a Stepping of 4 or greater. 
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Table 11. Hardware Configuration Register (HWCR) Fields (continued) 


Toit [ mnemonic [Desaipton [mtg 


Debug control bits: 
000 Off (disable HWCR debug control) 
3-1 DC Debug Control 
111 reserved 
Disables stopping of internal processor clocks in the 
DSPC Disable Stopping Halt and Stop Grant states 

Processor Clocks 

0 = enabled, 1 = disabled 


001 Enable branch-tracing messages. See “Branch 
Note: 
* — The AMD-KS processor supports write allocate only on Models 1, 2, and 3, with a Stepping of 4 or greater. 


Tracing” on page 39. 
Built-In Self-Test (BIST) 











010 reserved 





O11 reserved 





100 reserved 






101 reserved 









110 reserved 







The processor supports the following types of built-in self-test: 


a Normal BIST—A built-in self-test mode typically used to test 
system functions after RESET 


m Test Access Port (TAP) BIST—A self-test mode started by the 
TAP instruction, RUNBIST 


All internal arrays except the TLB are tested in parallel by 
hardware. The TLB is tested by microcode. The AMD-K5 
processor does not report parity errors on IERR for every cache 
or TLB access. Instead, the AMD-K5 fully tests its caches during 
the BIST. EADS should not be asserted during a BIST. The 
AMD-KS5 accesses the physical tag array during BISTs, and 
these accesses can conflict with inquire cycles. 
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The normal BIST is invoked if INIT is asserted at the falling 
edge of RESET. The BIST runs tests on the internal hardware 
that exercise the following resources: 


m Instruction cache: 
¢ Linear tag directory 
e Instruction array 
¢ Physical tag directory 
m Data cache: 
e Linear tag directory 
e Data array 
¢ Physical tag directory 
Entry-point and instruction-decode PLAs 
Microcode ROM 
TLB 


The BIST runs a linear feedback shift register (LFSR) signature 
test on the microcode ROM in parallel with a March C test on 
the instruction cache, data cache, and physical tags. This is 
followed by the March C test on the TLB arrays and an LFSR 
signature test on the PLA, in that order. Upon completion of 
the PLA test, the processor transfers the test result from an 
internal Hardware Debug Test (HDT) data register to the EAX 
register for external access, resets the internal microcode, and 
begins normal code fetching. 


The result of the BIST can be accessed by reading the lower 9 
bits of the EAX register. If the EAX register value is 
0000_0000h, the test completed successfully. If the value is not 
zero, the non-zero bits indicate where the failure occurred, as 
shown in Table 12. The processor continues with its normal 
boot process after the BIST is completed, whether the BIST 
passed or failed. 


Table 12. BIST Error Bit Definition in EAX Register 
Bit Value 


ee, TS a a 





[8 | Notror [Data pat 
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Test Access Port 
(TAP) BIST 


Output-Float Test 
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Table 12. BIST Error Bit Definition in EAX Register (continued) 


oo Bit Value 

oe a a et: 
Instruction-cache instructions 

| 6 | NoError | Instruction-cache lineartags 
| 5 | __NoError | Data-cachelinear tags 
| 4 | NoEror [PA 
| 2 | No€rror [Data-cachedata 
eed i 
oT 



















| No Error | Instruction cache physical tags 
Data-cache physical tags 


The TAP BIST performs all the functions of the normal BIST, up 
to and including the PLA signature test, in the exact manner as 
the normal BIST. However, after the PLA test, the test result is 
not transferred to the EAX register. 





The TAP BIST is started by loading and executing the 
RUNBIST instruction in the test access port, as described in 
“Boundary Scan Architecture Support” on page 41. When the 
RUNBIST instruction is executed, the processor enters into a 
reset mode that is identical to that entered when the RESET 
signal is asserted. Upon completion of the TAP BIST, the result 
remains in the BIST result register for shifting out through the 
TDO signal. The TRST signal must be asserted, or the TAP 
instruction must be changed, to exit TAP BIST and return to 
normal operation. 


The Output-Float Test mode is entered if FLUSH is asserted 
before the falling edge of RESET. This causes the processor to 
place all of its output and bidirectional signals in the 
high-impedance state. In this isolated state, system board 
traces and connections can be tested for integrity and 
driveability. The Output-Float Test mode can only be exited by 
asserting RESET again. 


On the AMD-KS5 processor and Pentium, FLUSH is an 
edge-triggered interrupt. On the 486 processor, however, the 
signal is a level-sensitive input. 
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Cache and TLB Testing 


AMD-K5™ Processor 


The internal cache for the AMD-K5 processor is divided into 
two caches—a 16-Kbyte, 4-way, set-associative instruction 
cache and an 8-Kbyte, 4-way, set-associative data cache. Cache 
and TLB testing is often done by the BIOS or operating system 
during power-up. 


Note: The AMD-K6 MMX processor does not contain these 
features. The AMD-K6 processor contains a built-in self-test 
for all internal memories. 


The individual locations of all SRAM arrays on the AMD-K5 
processor are accessible with the RDMSR and WRMSR 
instructions. To access an array location, set up the Array 
Access MSR code (82h) in ECX, and the array pointer (see 
page 28) in EDX. EAX holds the data to be read or written. 
Tests can be performed on the following arrays: _ 


m Data Cache—8-Kbyte, 4-way, set-associative 
e Data array 
e Linear-tag array 
¢ Physical-tag array 
m Instruction Cache—16-Kbyte, 4-way, set-associative 
e Instruction array 
e Linear-tag array 
e Physical-tag array 
¢ Valid-bit array 
¢ Branch-prediction bit array 
m 4-Kbyte TLB—128-entry, 4-way, set-associative 
e lLinear-tag array 
e Page array 
mw 4-Mbyte TLB—4-entry, fully associative 
e lLinear-tag array 
e Page array 
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Array Access Register The 64-bit Array Access Register (AAR) is a MSR that contains 
(AAR) a 32-bit array pointer that identifies the array location to be 


3] 






3] 


Array Pointer 
(Contents of EDX) 
Array Data 
(Contents of EAX) 


tested and 32 bits of array test data to be read or written. The 
WRMSR and RDMSR instructions access the AAR when the 
ECX register contains the value 82h, as described on page 90. 
Figure 3 shows the format of the AAR. 






MSR 
o [82h 


Figure 3. Array Access Register (AAR) 


Array Pointer 
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To read or write an array location, perform the following steps: 
1. ECX—Enter 82h into ECX to access the 64-bit AAR. 


2. EDX—Enter a 32-bit array pointer into EDX, as shown in 
Figures 4 through 12 (top). 


3. EAX—Read or write 32 bits of array test data to or from 
EAX, as shown in Figures 4 through 12 (bottom). 


The array pointers entered in EDX (Figures 4 through 12, top) 
specify particular array locations. For example, in the data- and 
instruction-cache arrays, the way (or column) and set (or index) 
in the array pointer specify a cache line in the 4-way, 
set-associative array. The array pointers for data-cache data 
and instruction-cache instructions also specify a dword location 
within that cache line. In the data cache, this dword is 32 bits of 
data; in the instruction cache, this dword is two instruction 
bytes plus their associated pre-decode bits. For the 4-Kbyte 
TLB, the way and set specify one of the 128 TLB entries. In 
4-Mbyte TLB, one of only four entries is specified. 
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Bits 7-0 of every array pointer encode the array ID, which 
identifies the array to be accessed, as shown in Table 13. To 
simplify multiple accesses to an array, the contents of EDX are 
retained after the RDMSR instruction executes (EDX is 
normally cleared after a RDMSR instruction). 


Table 13. Array IDs in Array Pointers 


Array Pointer 
Bits 7-0 _ Accessed Array 


a 
eh 


















EAX specifies the test data to be read or written with the 
RDMSR or WRMSR instruction (see Figures 4 through 12). For 
example, in Figure 4 (top) the array pointer in EDX specifies a 
way and set within the data-cache linear tag array (E1h in bits 
7-0 of the array pointer) or the physical tag array (ECh in bits 
7-0 of the array pointer). If the linear tag array (E1h) is 
accessed, the data read or written includes the tag and the 
status bits. The details of the valid fields in EAX are 
proprietary. 
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EDX: Array Pointer 
29 28 27 19 18 13 12 8 7 0 





Array ID 
(E1h, ECh) 





EAX: Test Data 


3] 26 8625 24 23. 22 21 +20 0 






(E1h) Linear Tag 


3] 23 22 21 20 0 
MESI STATE 






00 = Invalid, 01 = Shared 
10 = Modified, 11 = Exclusive 


(ECh) Physical Tag 


Figure 4. Test Formats: Dcache Tags for the AMD-K5™ Processor Model 0 
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EDX: Array Pointer 
31 30 29 28 27 19 18 13-12 8 7 0 





Array ID 
(E1h, ECh) 





EAX: Test Data 


3] 28 27 26 25 24 23. 22 21 20 0 





[P| P foity lusersupenisor| yw 
f W I Bit Bit Bit 
eel T 


(E1h) Linear Tag 


31 23 22 21 20 0 
: MESI STATE 


00 = Invalid, 01 = Shared 
10 = Modified, 11 = Exclusive 






(ECh) Physical Tag 


Figure 5. Test Formats: Dcache Tags for the AMD-K5™ Processor Model 1 and Greater 
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EDX: Array Pointer 


3}. 30. 29.28: 27 19 18 . 13-2 10 9 8 7 0 





Data Array Index 


EAX: Test Data 


5] 0 


Valid Bits 


(EOh) Data 


Figure 6. Test Formats: Dcache Data for All Models of the AMD-K5™ Processor 
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EDX: Array Pointer 


29 28 27 


Array ID 


(E5h, EDh, E6h, E7h) 





EAX: Test Data 


2) 20 19 0 


Linear Address 










(E5h) Linear Tag 
3] 21 20 19 0 


Tag (Physical Address 31-11) 





(EDh) Physical Tag 
3] 19 18 0 


Byte Valid Bits 





(E6h) Valid Bits 
3] 19 18 17 14 13 1211 4 3 0 


ve, f Pres Byte Offset Within | Column of 
_ [dicted] Block of Last Byte of } Predicted | Index of Predicted Target | Target Byte 


oo Faken} Predicted Branch Target 
cae Instruction 





(E7h) Branch-Prediction Bits 


Figure 7. Test Formats: Icache Tags for the AMD-K5™ Processor Model 0 
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EDX: Array Pointer 


31 30 29 28 2 20 19 13 12 8 7 0 






cache Index for All =f Array ID 
Icache Arrays (E5h, EDh, E6h, E7h) 


EAX: Test Data 


31 
Linear Address 


(E5h) Linear Tag 
3] 21 20 19 0 


Valid Bits 





(EDh) Physical Tag 
3] 0 


Valid Bits 





(E6h) Valid Bits 
31 19 18 17 14 13 12 1] 4 3 0 


| Pre- [Byte Offset Within} Column of . 
‘dicted |Block of Last Byte off Predicted | Index of Predicted Target | Target Byte 


aken{Predicted Branch 





(E7h) Branch-Prediction Bits 


Figure 8. Test Formats: ICache Tags for the AMD-K5™ Processor Model 1 and Greater 
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EDX: Array Pointer 


31 30 29 28 27 20 19 2 11 o.°0 7 0 


Opcode Array ID 


Bytes (E4h) 





EAX: Test Data 


Prefix 1 Prefix 0 
31 26 25 24 23 22 21 20 1312 11 8 7 0 


ee eae Op] Map Map 
ee Bh Bia tHEndicode| ROPS/ Byte 1 Start} End Ri ROPS/ 
| Bit | Bit] Bit] MROM Bit | Bit | Bit] MROM 





(E4h) Instruction Bytes 


Figure 9. Test Formats: Icache Instructions for the AMD-K5™ Processor Model 0 


EDX: Array Pointer 


31 30 29 28 27 20 19 12 11 10 8 7 0 


| Icache Index for Alll cache Array ID 
Arrays (E4h) 





Packet:0/1>low/highlow: Bytes 0-7 and 8-15 high: Bytes 16-23 and 24-31 


EAX: Test Data 


Prefix 1 Prefix 0 
31 26 25 24 23 22 21 20 13 12 10 9 8 7 


Te aS Map Map 
_ BtartjEnd be ROPS/ Byte (n + 8) Start] End be ROPS/ Byte (n) 
~ [it] Bit | Bit | MROM Bit | Bit | Bit | MROM 


(E4h) Instruction Bytes 











Figure 10. Test Formats: Icache Instructions for the AMD-K5™ Processor Model 1 and Greater 
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EDX: Array Pointer 


31 30 29 28 27 13 12 8 7 0 


Array ID 





EAX: Test Data 


3] 22 21 2019 0 







Page Frame Address 


(E8h) 4-Kbyte Page and Status 


al 20 19 18 17 16 15 14 0 


Tag (Virtual Address 31-17) 





(E9h) 4-Kbyte Virtual Tag 


Symbol Description Bits 
GV Global Valid Bit 19 
D Dirty Bit 18 
U/S User Supervisor Bit “Yd 
RWW Read or Write Bit 16 
V Valid Bit 15 


Figure 11. Test Formats: 4-Kbyte TLB for All Models of the AMD-K5™ Processor 
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EDX: Array Pointer 
31 30 29 28 27 8 7 0 


Array ID 


(EAh, EBh) 





EAX: Test Data 


3] 12 11 10 0 






Valid Bits 


(EAh) 4-Mbyte Page and Status 


3] 15 14 13 12 11:10 9 0 







Valid Bits 


(EBh) 4-Mbyte Virtual Tag 
Symbol Description Bits 
GV Global Valid Bit 14 


D Dirty Bit 13 
U/S User Supervisor Bit 12 
RW Read or Write Bit 1] 
V Valid Bit 10 


Figure 12. Test Formats: 4-Mbyte TLB for All Models of the AMD-K5™ Processor 
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Debug Registers 


Standard Debug 
Functions 


I/O Breakpoint 
Extension 
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The processor implements the standard debug functions and 
registers—DR7-DR6 and DR3-DRO (often called DR7—DRO)— 
available on the 486 processor, plus an I/O breakpoint 
extension. 


The debug functions make the processor’s state visible to 
debug software through four debug registers (DR3-DRO) that 
are accessed by MOV instructions. Accesses to memory 
addresses can be set as breakpoints in the instruction flow by 
invoking one of two debug exceptions (interrupt vectors 1 or 3) 
during instruction or data accesses to the addresses. The debug 
functions eliminate the need to embed breakpoints in code and 
allow debugging of ROM as well as RAM. 


For details on the standard 486 debug functions and registers, 
see the AMD documentation on the Am486° processor or other 
commercial x86 literature. 


The processor supports an I/O breakpoint extension for 
breakpoints on I/O reads and writes. This function is enabled by 
setting bit 3 of CR4, as described in “Control Register 4 (CR4) 
Extensions” on page 58. When enabled, the I/O breakpoint 
function is invoked by the following: 


=» Entering the I/O port number as a breakpoint address 
(zero-extended to 32 bits) in one of the breakpoint registers, 
DR3-DRO 


= Entering the bit pattern, 10b, in the corresponding 2-bit 
read-write (R/W) field in DR7 


All data breakpoints on the AMD-KS processor are precise, 
including those encountered in repeated string operations. The 
trap is taken after completing the iteration on which the 
breakpoint match occurs. 


Enabled breakpoints slow the processor somewhat. When a 
data breakpoint is enabled, the processor disables its dual-issue 
load/store operations and performs only single-issue load/store 
operations. When an instruction breakpoint is enabled, 
instruction issue is completely serialized. 
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Debug Compatibility The differences in debug functions between the AMD-K5 
with Pentium processor and Pentium are described in Appendix A of the 
AMD-K5™ Processor Technical Reference Manual, order# 18524. 


Branch Tracing 


Branch tracing is enabled by writing bits 3-1 with 001b and 
setting bit 5 to 1 (disabling branch prediction) in the Hardware 
Configuration Register (HWCR), as described on page 22. 
When thus enabled, the processor drives two branch-trace 
message special bus cycles immediately after each taken 
branch instruction is executed. Both special bus cycles have a 
BE7-BEO encoding of DFh (1101_1111b). The first special bus 
cycle identifies the branch source, the second identifies the 
branch target. The contents of the address and data bus during 
these special bus cycles are shown in Table 14. 


The branch-trace message special bus cycles are different for 
the AMD-KS processor and Pentium, although their BE7-BEO 
encodings are the same. 


Table 14. Branch-Trace Message Special Bus Cycle Fields 


| Signals | First Special Bus Cycle Second Special Bus Cycle 









1 0 = First special bus cycle (source) 1 = Second special bus cycle (target) 


Operating Mode of Target: 
A30-A29 Not valid 


11 = Virtual-8086 Mode 


10 = Protected Mode 

01 = Not valid 

00 = Real Mode | 

Default Operand Size of Target Segment: 
CC 
Code Segment (CS) selector of Branch Source 
a 












1 =32-bit 
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Functional-Redundancy Checking 
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When FRCMC is asserted at RESET, the processor enters 
Functional-Redundancy Checking mode as the checker and 
reports checking errors on the IERR output. If FRCMC is 
negated at RESET, the processor operates normally, although 
it also behaves as the master in a functional-redundancy 
checking arrangement with a checker. 


In the Functional-Redundancy Checking mode, two processors 
have their signals tied together. One processor (the master) 
operates normally. The other processor (the checker) has its 
output and bidirectional signals (except for TDO and IERR) 
floated to detect the state of the master’s signals. The master 
controls instruction fetching and the checker mimics its 
behavior by sampling the fetched instructions as they appear 
on the bus. Both processors execute the instructions in lock 
step. The checker compares the state of the master’s output and 
bidirectional signals with the state that the checker itself 
would have driven for the same instruction stream. 


Errors detected by the checker are reported on the IERR 
output of the checker. If a mismatch occurs on such a 
comparison, the checker asserts IERR for one clock, two clocks 
after the detection of the error. Both the master and the 
checker continue running the checking program after an error 
occurs. No action other than the assertion of IERR is taken by 
the processor. On the AMD-KS processor, the IERR output is 
reserved solely for functional-redundancy checking. No other 
errors are reported on that output. 


Functional-redundancy checking is typically implemented on 
single-processor, fault-monitoring systems (which have two 
processors). The master processor runs the operational 
programs and the checker processor is dedicated entirely to 
constant checking. In this arrangement, the accurate operation 
test consists solely of reporting one or more errors. The 
particular error type or the instruction causing an error is not 
reported. The arrangement works because the processor is 
entirely deterministic. Speculative prefetching, speculative 
execution, and cache replacement all occur in identical ways 
and at identical times on both processors if their signals are 
tied together so that they run the same program. 
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The Functional-Redundancy Checking mode can only be exited 
by the assertion of RESET. Functional-redundancy checking 
cannot be performed in the Hardware Debug Tool (HDT) mode. 
The assertion of FRCMC is not recognized while PRDY is 
asserted. 


Boundary Scan Architecture Support 


AMD-K5™ Processor 


The AMD-K5 processor provides test features compatible with 
the Standard Test Access Port (TAP) and Boundary Scan Test 
Architecture as defined in the IEEE 1149.1-1990 JTAG 
Specification. The subsections in this topic include: 


Boundary Scan Test Functional Description 
Boundary Scan Architecture 

Registers 

The Test Access Port (TAP) Controller 
JTAG Register Organization 

JTAG Instructions 


The external TAP interface consists of five pins: 

m TCK: The Test Clock input provides the clock for the JTAG 
test logic. 

= TMS: The Test Mode Select input enables TAP controller 
operations. 
TDI: The Test Data Input provides serial input to registers. 
TDO: The Test Data Output provides serial output from the 


registers; the signal is tri-stated except when in the Shift-DR 
or Shift-IR controller states. 


m TRST: The TAP Controller Reset input initializes the TAP 
controller when asserted Low. 


The internal JTAG logic contains the elements listed below: 


m The Test Access Port (TAP) Controller—Decodes the inputs 
on the Test Mode Select (TMS) line to control test 
operations. The TAP is a general-purpose port that provides 
access to the test support functions built into the AMD-KS. 


mw Instruction Register—Accepts instructions from the Test 
Data Input (TDI) pin. The instruction codes select the 
specific test or debug operation to be performed or the test 
data register to be accessed. 
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Boundary Scan Test 
Functional 
Description 


Boundary Scan 
Architecture 
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= Implemented Test Data Registers—Boundary Scan 
Register, Device Identification Register, and Bypass 
Register. See “JTAG Register Organization” on page 44 for 
more information. 


Note: See Table 18 on page 49 for more information. 


The boundary scan testing uses a shift register, contained ina 
boundary scan cell, located between the core logic and the I/O 
buffers adjacent to each component pin. Signals at each input 
and output pin are controlled and observed using scan testing 
techniques. The boundary scan cells are interconnected to form 
a shift register chain. This register chain, called a Boundary 
Scan Register (BSR), constructs a serial path surrounding the 
core logic, enabling test data to be shifted through the 
boundary scan path. When the system enters the Boundary 
Scan Test mode, the BSR chain is directed by a test program to 
pass data along the shift register path. 


If all the components used to construct a circuit or PCB contain 
a boundary scan cell architecture, the resulting serial path can 
be used to perform component interconnect testing. 


Boundary Scan architecture has four basic elements: 


m Test Access Port (TAP) 
m TAP Controller 


m Instruction Register (IR). See “Instruction Register” on 
page 44 for more information. 


m Test Data Registers. See “Registers” on page 43 for more 
information. 


The Instruction and Test Data Registers have separate shift 
register access paths connected in parallel between the Test 
Data In (TDI) and Test Data Out (TDO) pins. Path selection and 
boundary scan cell operation is controlled by the TAP 
Controller. The controller initializes at start-up, but the Test 
Reset (TRST) input can asynchronously reset the test logic, if 
required. 


All system integrated circuit (IC) I/O signals are shifted in and 
out through the serial Test Data In (TDI) and Test Data Out 
(TDO) path. The TAP Controller is enabled by the Test Mode 
Select (TMS) input. The Test Clock (TCK), obtained from a 
system level bus or Automatic Test Equipment (ATE), supplies 
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the timing signal for data transfer and system architecture 
operation. 


The dedicated TCK input enables the serial test data path 
between components to be used independently of 
component-specific system clocks. TCK also ensures that test 
data can be moved to or from a chip without changing the state 
of the on-chip system logic. 


The TCK signal is driven by an independent 50% duty cycle 
clock (generated by the Automatic Test Equipment). If the TCK 
must be stopped (for example, if the ATE must retrieve data 
from external memory and is unable to keep the clock running), 
it can be stopped at 0 or 1 indefinitely, without causing any 
change to the test logic state. 


To ensure race-free operation, changes on the TAP’s TMS input 
are clocked into the test logic. Changes on the TAP’s TDI input 
are clocked into the selected register (Instruction or Test Data 
Register) on the rising edge of TCK. The contents of the 
selected register are shifted out onto the TAP output (TDO) on 
the falling edge of TCK. 


Boundary scan architectural elements include an Instruction 
Register (IR) and a group of Test Data Registers (TDRs). These 
registers have separate shift-register-based serial access paths 
connected in parallel between the TDI and TDO pins. 


The TDRs are internal registers used by the Boundary Scan 
Architecture to process the test data. Each Test Data Register 
is addressed by an instruction scanned into the Instruction 
Register. The AMD-K5 processor includes the following TDRs: 


m Bypass Register (BR). See “Bypass Register” on page 45. 

= Boundary Scan Register (BSR). See “Boundary Scan 
Register” on page 44. 

m Device Identification Register (DIR). See “Device 
Identification Register” on page 45. 


= Built-In Self-Test Result Register (BISTRR). See 
“RUNBIST” on page 48. 
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JTAG Register 
Organization 


Instruction Register. The 5-bit Instruction Register (IR) isa 
serial-in parallel-out register that includes five shift 
register-based cells for holding instruction data. The 
instruction determines which test to run, which data register to 
access, or both. When the TAP controller enters the Capture-IR 
state, the processor loads the IDCODE instruction in the IR. 
Executing Shift-IR starts instructions shifting into the 
instruction register on the rising edge of TCK. Executing 
Update-IR loads the instruction from the serial shift register to 
the parallel register. 


The TAP controller is a synchronous, finite-state machine that 
controls the test and debug logic sequence of operations. The 
TAP controller changes state in response to the rising edge of 
TCK and defaults to the test logic reset state at power-up. 
Reinitialization to the test logic reset state is accomplished by 
holding the TMS pin High for five TCK periods. 


All registers in the JTAG logic consist of the following two 
register ranks: 


m Shift register 
= Parallel output register fed by the shift register 


Parallel input data is loaded into the shift register when the 
TAP controller exits the Capture state (Capture-DR or 
Capture-IR). The shift register then shifts data from TDI to 
TDO when in the Shift state (Shift-DR or Shift-IR). The output 
register holds the current data while new data is shifted into 
the shift register. The contents of the output register are 
updated when the TAP controller exits the Update state 
(Update-DR or Update-IR). The following three registers are 
described in this section: 


= Boundary Scan Register 
m Device Identification Register 
m Bypass Register 


Boundary Scan Register. The Boundary Scan Register (BSR) isa 
261-bit shift register with cells connected to all input and 
output pins and containing cells for tri-state I/O control. This 
arrangement enables serial data to be loaded into or read from 
the processor boundary scan area. 
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Output cells determine the value of the signal driven on the 
corresponding pin. Input cells only capture data. The EXTEST 
and SAMPLE/PRELOAD instructions can operate the BSR. 


Device Identification Register. The format of the Device 
Identification Register (DIR) is shown in Table 15. The fields 
include the following values: 


= Version Number—This field is incremented by AMD 
manufacturing for each major revision of silicon. 


= Bond Option—The two bits of the bond option depend on 
how the part is bonded at the factory. 


ms Part Number—This field identifies the specific processor 
model. 


m Manufacturer—This field is actually only 11 bits (11-1). The 
least-significant bit, bit 0, is always set to 1, as specified by 
the IEEE standard. 


Table 15. AMD-K5™ Processor Device Identification Register 


Version Bond Option | PartNumber | Manufacturer LSB 
(Bits 31-28) | (Bits 27-26) | (Bits 25-12) | (Bits 11-1) (Bit 0) 
Poh [xo __ostah | oo00000000"b [1b 


Bypass Register. The Bypass Register, a 1-bit shift register, 
provides the shortest path between TDI and TDO. When the 
component is not performing a test operation, this path is 
selected to allow transfer of test data to and from other 
components on the board. The Bypass Register is also selected 
during the HIGHZ, ALL1, ALLO, and BYPASS tests and for any 
unused instruction codes. 










The processor supports all three IEEE-mandatory instructions 
(BYPASS, SAMPLE/PRELOAD, EXTEST), three IEEE-optional 
instructions (IDCODE, HIGHZ, RUNBIST), and three 
instructions unique to the AMD-KS5 processor (ALL1, ALLO, 
USEHDT). Table 16 shows the complete set of public TAP 
instructions supported by the processor. The AMD-K5 also 
implements several private manufacturing test instructions. 


The IEEE standard describes the mandatory and optional 
instructions. The ALL1 and ALLO instructions simply force all 
outputs and bidirectionals High or Low. The USEHDT 
instruction is described on page 57. Any instruction encodings 
not shown in Table 16 select the BYPASS instruction. 
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Table 16. Public TAP Instructions 


[sacion | trading [never | —“epton 
ae 
ae 







ALL1 00100 Forces all outputs and bidirectionals High 
ALLO Forces all outputs and bidirectionals Low 






RUNBIST BISTRR As defined by the IEEE standard 
BYPASS As defined by the IEEE standard 


BYPASS | undefined] BR Undefined instruction encodings select the BYPASS 
instruction 


EXTEST. The EXTEST instruction permits circuits outside the 

- component package to be tested. A common use of the EXTEST 
instruction is the testing of board interconnects. Boundary scan 
register cells at output pins are used to apply test stimuli, while 
those at input pins capture test results. Depending on the value 
loaded into their control cells in the boundary scan register, the 
I/O pins are established as input or output. Inputs to the core 
logic retain the logic value set prior to execution of the 
EXTEST instruction. Upon exiting EXTEST, input pins are 
reconnected to the package pins. 


USEHDT poor HDTR Accesses the Hardware Debug Tool (HDT) 
See page 57 








SAMPLE/PRELOAD. There are two functions performed by the 
SAMPLE/PRELOAD instruction, as follows: 


= Capturing an instantaneous picture of the normal operation 
of the device being tested. This function occurs if the 
instruction is executed while the TAP controller is in the 
Capture-DR state and causes the Boundary Scan Register to 
sample the values present at the device pins. 


= Preloading data to the device pins to be driven to the board 
by the EXTEST instruction. This function occurs if the 
instruction is executed while the TAP controller is in the 
Update-DR state and causes data to be preloaded to the 
device pins from the Boundary Scan Register. 
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IDCODE. The execution of the IDCODE instruction connects the 
device identification register between TDI and TDO. Upon 
such connection, the device identification code can be shifted 
out of the register. 


HIGHZ. This instruction forces all output and bidirectional pins 
into a tri-state condition. When this instruction is selected, the 
bypass register is selected for shifting between TDI and TDO. A 
signal called HIZEXT is responsible for forcing the tri-state to 
occur. This signal is generated in the TAP block, underneath 
JTAG_BIST, and goes to the PAD_TOP block. 


ALL1. This instruction forces all output and bidirectional pins to 
a High logic level. 


The ALL1 instruction, like the HIGHZ instruction, selects the 
bypass register for shifting between TDI and TDO. A signal 
called ALL1 is responsible for forcing the pins to a High state. 
This signal is generated in the TAP block underneath 
JTAG_BIST and goes to the PAD_TOP block. In the PAD_TOP 
block, this signal goes to boundary scan cells called 
BSLCD_OUT. The DOUT pins of the BSLCD_OUT cells are 
forced High when ALL1 is High. The SELPDR signal selects the 
boundary scan cells as the source for driving the outputs if the 
SELPDR signal is High. The SELPDR signal is also generated in 
the TAP block underneath JTAG_BIST and goes to the 
PAD_ TOP block. 


ALLO. This instruction forces all output and bidirectional pins to 
a Low logic level. 


The ALLO instruction, like the HIGHZ instruction, selects the 
bypass register for shifting between TDI and TDO. A signal 
called ALLO is responsible for forcing the pins to a Low state. 
This signal is generated in the TAP block underneath 
JTAG_BIST and goes to the PAD_TOP block. In the PAD_TOP 
block, this signal goes to boundary scan cells called 
BSLCD_OUT. The DOUT pins of the BSLCD_OUT cells are 
forced Low when ALLO is High. The SELPDR signal selects the 
boundary scan cells as the source for driving the outputs if the 
SELPDR signal is High. The SELPDR signal is also generated in 
the TAP block underneath JTAG_BIST and goes to the 
PAD_TOP block. 
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RUNBIST. This version of BIST is similar to the normal BIST 
mode, except RUNBIST is started by shifting in a TAP 
instruction. This instruction should behave according to the 
rules of the IEEE 1149.1 definition of RUNBIST. 


When the RUNBIST instruction is updated into the instruction 
register, a signal from the TAP_RTL block called JTGBIST is 
asserted High. This signal goes to the PAD_TOP and 
TESTCTRL blocks. In PAD_TOP, this signal goes to the 
BRNBIST block and causes both INIT_SAMP and RUNBIST to 
be asserted. To the rest of the processor, it looks like a normal 
BIST operation is taking place. The JTGBIST signal also goes to 
the TESTCTRL block so the BIST controller knows the BIST 
operation was initiated from the TAP controller. This operation 
is necessary because the BIST results do not get transferred to 
the EAX register in this mode of operation. The JTAG_BIST 
block also asserts the RESET_TAP pin to the CLOCKS block for 
15 system clock cycles in order to fake an external reset. 


The pattern that is shifted into the boundary scan ring prior to 
the selection of the RUNBIST instruction is driven at output 
and bidirectional cells during the duration of the instruction. 
The results of the execution of RUNBIST are saved in the BIST 
results register, which is 9 bits long and looks like the least 
significant 9 bits in the EAX register. This register is selected 
for shifting between TDI and TDO and can be shifted out after 
the completion of BIST. Bit 0 (ICACHE data status) is shifted 
out first. The BIST results should be independent of signals 
received at non-clock input pins (except for RESET). 


BYPASS. The execution of the BYPASS instruction connects the 
bypass register between TDI and TDO, bypassing the test logic. 
Because of the pull-up resistor on the TDI input, the bypass 
register is selected if there is an open circuit in the board-level 
test data path following an instruction scan cycle. Any unused 
instruction bit patterns cause the bypass register to be selected 
for shifting between TDI and TDO. 
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The control bits listed in Table 18 have the characteristics 
described in Table 17. 


Table 17. Control Bit Definitions 


Definition 


Controls the direction of the Data bus (D63-D0). If the bit is set to 1, the 
bus acts as an input. If the bit is set to 0, the bus acts as an output. 


Controls the direction of the Address bus (A31-A3) and Address Parity 


(AP). If the bit is set to 1, the bus acts as an input. If the bit is set to 0, the 
bus acts as an output. 

Controls pins that can be tri-stated, but these pins never act as inputs. If 
the bit is set to 1, the pin is tri-stated. If the bit is set to 0, the pin acts as 
an output. 





Table 18. Boundary Scan Register Bit Definitions 


ae ee 
[or |_ipit C Controled bytes 
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re 
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Table 18. Boundary Scan Register Bit Definitions (continued) 
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Table 18. Boundary Scan Register Bit Definitions (continued) 
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Table 18. Boundary Scan Register Bit Definitions (continued) 
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Table 18. Boundary Scan Register Bit Definitions (continued) 


[Bit | PinName | Comments 
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Table 18. Boundary Scan Register Bit Definitions (continued) 


| pit | Pinname | Comments 
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Table 18. Boundary Scan Register Bit Definitions (continued) 
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Table 18. Boundary Scan Register Bit Definitions (continued) 
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Table 18. Boundary Scan Register Bit Definitions (continued) 











Output Cell 
260 Output Cell 





Hardware Debug Tool (HDT) 


The Hardware Debug Tool (HDT)—sometimes referred to as 
the debug port or Probe Mode—is a collection of signals, 
registers, and processor microcode that is enabled when 
external debug logic drives R/S Low or loads the processor’s 
Test Access Port (TAP) instruction register with the USEHDT 
instruction. 


AMD-K5™ Processor x86 Architecture Extensions 


AMD-K5™ Processor 


The AMD-KS5 processor is compatible with the instruction set, 
programming model, memory management mechanisms, and 
other software infrastructure supported by the 486 and 
Pentium (735\90, 815\100) processors. Operating system and 
application software that runs on Pentium can be executed on 
the AMD-K5. Because the AMD-K5 processor takes a 
significantly different approach to implementing the x86. 
architecture, some subtle differences from Pentium may be 
visible to system and code developers. These differences are 
described in Appendix A of the AMD-K5™ Processor Technical 
Reference Manual, order# 18524. 


Call AMD at 1-800-222-9323 to order AMD-K5 support 
documents. 


Before implementing the AMD-K5 processor model-specific 


features, check CPUID for supported feature flags. See 
“CPUID” on page 86 for more information. 
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Additions to the EFLAGS Register 


The EFLAGS register on the AMD-K5 processor defines new 
bits in the upper 16 bits of the register to support extensions to 
the operating modes. See “Virtual-8086 Mode Extensions 
(VME)” on page 67 and “CPUID” on page 86 for additional 
information. 


Control Register 4 (CR4) Extensions 


Control Register 4 (CR4) was added on the AMD-K5. The bits in 
this register control the various architectural extensions. The 
majority of the bits are reserved. The default state of CR4 is all 
zeros. Figure 13 shows the register and describes the bits. The 
architectural extensions are described in Table 19. 








_ |—» Reserved 
Symbol Description Bit 
GPE 


Global Page Extension 7 
MCE Machine Check Enable 6 
PSE Page Size Extensions 4 
DE Debugging Extensions 3 
TSD Time Stamp Disable 2 
PVI Protected Virtual Interrupts 1 
VME Virtual-8086 Mode Extensions 0 


Figure 13. Control Register 4 (CR4) 


58 AMD-K5™ Processor 


Preliminary Information AMD<¢l 


21062C/0—March 1997 AMD K86™ Family BIOS and Software Tools Developers Guide 


Table 19. Control Register 4 (CR4) Fields 


Tt enone | —_Descitin 


Global Page 
Extension* 


Machine-Check Enable 


Page Size 
Extension 


Debugging 
Extensions 


Time Stamp 
Disable 


Protected Virtual 
Interrupts 


Virtual-8086 
Mode Extensions 


Enables retention of designated entries in the 4-Kbyte TLB or 
4-Mbyte TLB during invalidations. 


1 = enabled, 0 = disabled. 

See “Global Pages” on page 65 for details. 

Enables machine-check exceptions. 

1 = enabled, 0 = disabled. 

See “Machine-Check Exceptions” on page 60 for details. 
Enables 4-Mbyte pages. 

1 = enabled, 0 = disabled. 

See “4-Mbyte Pages” on page 60 for details. 

Enables I/O breakpoints in the DR7-DRO registers. 

1 = enabled, 0 = disabled. 

See “Debug Registers” on page 38 for details. 

Selects privileged (CPL=0) or non-privileged (CPL>0) use of 
the RDTSC instruction, which reads the Time Stamp Counter 
(TSC). 

1 = CPL must be 0, 0 =any CPL. 

See “Time Stamp Counter (TSC)” on page 81 for details. 


Enables hardware support for interrupt virtualization in 
Protected mode. 


1 = enabled, 0 = disabled. 


See “Protected Virtual Interrupt (PVI) Extensions” on page 79 
for details. 


Enables hardware support for interrupt virtualization in 
Virtual-8086 mode. 


1 = enabled, 0 = disabled. 


See “Virtual-8086 Mode Extensions (VME)” on page 67 for 
details. 


* The AMD-K5 processor supports global paging only on Models 1, 2, and 3, with a Stepping of 4 or greater. 
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Bit 6 in CR4, the machine-check enable (MCE) bit, controls 
generation of machine-check exceptions (12h). If enabled by 
the MCE bit, these exceptions are generated when either of the 
following occurs: 


m System logic asserts BUSCHK to identify a parity or other 
type of bus-cycle error 


m The processor asserts PCHK while system logic asserts PEN 
to identify an enabled parity error on the D63—D0 data bus 


Whether or not machine-check exceptions are enabled, the 
processor performs the following functions when either type of 
bus error occurs: 


= Latches the physical address of the failed cycle in its 64-bit 
machine-check address register (MCAR) 


m Latches the cycle definition of the failed cycle in its 64-bit 
machine-check type register (MCTR) 


Software can read the MCAR and MCTR registers in the 
exception handling routine with the RDMSR instruction, as 
described on page 90. The format of the registers is shown in 
Figures 20 and 21. 


If system software has cleared the MCE bit in CR4 to 0 before a 
bus-cycle error, the processor attempts to continue execution 
without generating a machine-check exception. The processor 
still latches the address and cycle type in MCAR and MCTR as 
described in this section. 


The TLBs in the 486 and 386 processors support only 4-Kbyte 
pages. However, large data structures, such as a video frame 
buffer or non-paged operating system code, can consume many 
pages and easily overrun the TLB. The AMD-K5 processor 
accommodates large data structures by allowing the operating 
system to specify 4-Mbyte pages as well as 4-Kbyte pages, and 
by implementing a four-entry, fully-associative 4-Mbyte TLB 
that is separate from the 128-entry, 4-Kbyte TLB. From a given 
page directory, the processor can access both 4-Kbyte pages 
and 4-Mbyte pages, and the page sizes can be intermixed within 
a page directory. When the Page Size Extension (PSE) bit in 
CR4 is set, the processor translates linear addresses using 
either the 4-Kbyte TLB or the 4-Mbyte TLB, depending on the 
state of the page size (PS) bit in the page-directory entry. 
Figures 14 and 15 show how 4-Kbyte and 4-Mbyte page 
translations work. 
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Linear Address 


Figure 14, 4-Kbyte Paging Mechanism 
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4-Mbyte Page 
Frame 


Page 
Directory 


ae 
ae 


Page-Directory Page 
Offset Offset 


Linear Address 


Figure 15. 4-Mbyte Paging Mechanism 


To enable the 4-Mbyte paging option: 
1. Set the Page Size Extension (PSE) bit in CR4 to 1. 
2. Set the Page Size (PS) bit in the page-directory entry to 1. 


3. Write the physical base addresses of 4-Mbyte pages in bits 
31-22 of page-directory entries. (Bits 21-12 of these entries 
must be cleared to 0 or the processor generates a page 
fault.) 


4. Load CR3 with the base address of the page directory that 
contains these page-directory entries. 
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Figure 13 and Table 19 show the fields in CR4. Figure 16 and 
Table 20 show the fields in a page-directory entry. 


4-Kbyte page translation differs from 4-Mbyte page translation 
in the following ways: 


m 4-Kbyte Paging (Figure 14)—Bits 31-22 of the linear address 
select an entry in a 4-Kbyte page directory in memory, 
whose physical base address is stored in CR3. Bits 21-12 of 
the linear address select an entry in a 4-Kbyte page table in 
memory, whose physical base address is specified by bits 
31-22 of the page-directory entry. Bits 11-0 of the linear 
address select a byte in a 4-Kbyte page, whose physical base 
address is specified by the page-table entry. 


m 4-Mbyte Paging (Figure 15)—Bits 31-22 of the linear address 
select an entry in a 4-Mbyte page directory in memory, 
whose physical base address is stored in CR3. Bits 21-0 of 
the linear address select a byte in a 4-Mbyte page in 
memory, whose physical base address is specified by bits 
31-22 of the page-directory entry. Bits 21-12 of the 
page-directory entry must be cleared to 0. 


2.11109 8 7 6 5 43 2 1 «0 






Description we 

ae Available to Software H- 
G Global 8 
PS Page Size 0 = 4 Kbytes 7 
Reserved = 0 6 
A Accessed 5 
PCD Page Cache Disable 4 
PWT Page Writethrough 3 
U/S User/Supervisor 2 
W/R Write/Read 1 
0 


P Present (valid) 


Figure 16. Page-Directory Entry (PDE) 
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Table 20. Page-Directory Entry (PDE) Fields 


[et [Mnemonic] oescipion [ancien 
| For 4-Kbyte pages, bits 31-12 contain the physical base address of 
a 4-Kbyte page table. 
31-12 BASE For 4-Mbyte pages, bits 31-22 contain the physical base address of 
a 4-Mbyte page and bits 21-12 must be cleared to 0. (The 
AVL the page-directory entry is not present (P bit cleared), bits 31-1 
become available to software. 
fee ee os Global* 0 = local, 1 = global. 
0= Koy, 1= 4 bye 


processor generates a page fault if bits 21-12 are not cleared to 0.) 
For 4-Kbyte pages, this bit is undefined and ignored. The 


processor does not change it. 
















Physical Base Address 













Software may use this field to store any type of information. When 






Available to Software 
















0 = not written, 1 = written. 


For 4-Mbyte pages, the processor sets this bit to 1 during a write 
to the page that is mapped by this page-directory entry. 


2g 
2 


0 = not written, 1 = written. 


The processor sets this bit to 1 during a read or write to any page 
that is mapped by this page-directory entry. 







Accessed 
0 = not read or written, 1 = read or written. 





a 







Specifies cacheability for all pages mapped by this page-directory 
entry. Whether a location in a mapped page is actually cached 
also depends on several other factors. 









PCD Page Cache Disable 








0 = cacheable page, 1 = non-cacheable. 


Specifies writeback or writethrough cache protocol for all pages 
mapped by this page-directory entry. Whether a location in a 
mapped page is actually cached in a writeback or writethrough 
state also depends on several other factors. 












Page Writethrough 





0 = writeback page, 1 = writethrough page. 


User/Supervisor 0 = user (any CPL), 1 = supervisor (CPL <3). 
Write/Read 0 =read or execute, 1 = write, read, or execute. 
Present 0 = not valid, 1 = valid. 


* The AMD-K5 processor supports global paging only on Models 1, 2, and 3, with a Stepping of 4 or greater. 


a 


u/S 
WR 
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The AMD-K5 processor supports global paging only on Models 
1, 2, and 3, with a Stepping of 4 or greater. 


The processor’s performance can sometimes be improved by 
making some pages global to all tasks and procedures. This can 
be done for both 4-Kbyte pages and 4-Mbyte pages. 


The processor invalidates (flushes) both the 4-Kbyte TLB and 
the 4-Mbyte TLB whenever CR3 is loaded with the base address 
of the new task’s page directory. The processor loads CR3 
automatically during task switches, and the operating system 
can load CR3 at any other time. Unnecessary invalidation of 
certain TLB entries can be avoided by specifying those entries 
as global (a global TLB entry references a global page). This 
improves performance after TLB flushes. Global entries remain 
in the TLB and need not be reloaded. For example, entries may 
reference operating system code and data pages that are 
always required. The processor operates faster if these entries 
are retained across task switches and procedure calls. 


To specify individual pages as global: 
1. Set the Global Page Extension (GPE) bit in CR4. 
2. (Optional) Set the Page Size Extension (PSE) bit 1n CR4. 


3. Set the relevant Global (G) bit for that page: 


For 4-Kbyte pages—Set the G bit in both the page-directory 
entry (shown in Figure 16 and Table 20) and the page-table 
entry (shown in Figure 17 and Table 21). 


For 4-Mbyte pages—(Optional) After the PSE bit in CR4 is 
set, set the G bit in the page-directory entry (shown in 
Figure 16 and Table 20). 


4. Load CR3 with the base address of the page directory. 
The INVLPG instruction clears both the V and G bits for the 


referenced entry. To invalidate all entries in both TLBs, 
including global-page entries: 


1. Clear the Global Page Extension (GPE) bit in CR4. 


2. Load CR3 with the base address of another (or same) page 
directory. 
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Physical Page Base Address 


Symbol Description Bits | 
AVL Available to Software 11-9 
G Global 8 
PS Page Size 1 = 4 Mbytes 7 

Reserved = 0 6 
A Accessed 5 
PCD Page Cache Disable 4 
PWT Page Writethrough 3 
U/S User/Supervisor 2 

W/R Write/Read 1 
P Present (valid) 0 


Figure 17. Page-Table Entry (PTE) 


Table 21. Page-Table Entry (PTE) Fields 


| Bit |Mnemonic| Description | Function. 
BASE _| Physical Base Address | The physical base address of a 4-Kbyte page. 


pus | Software may use the field to store any type of information. 
AVL 


Available to Software | When the page-table entry is not present (P bit cleared), bits 31-1 
| 8 | G | Global* 0 = local, 1 = global. 


become available to software. 
This bit is ignored in page-table entries, although clearing it to 0 
7 PS Page Size preserves consistent usage of this bit between page-table and 
page-directory entries. 
|e fm 
ps | kfm 



































The processor sets this bit to 1 during a write to the page that is 
mapped by this page-table entry. 







0 = not written, 1 = written. 


The processor sets this bit to 1 during a read or write to any page 
that is mapped by this page-table entry. 








0 = not read or written, 1 = read or written. 


Specifies cacheability for all locations in the page mapped by this 
page-table entry. Whether a location is actually cached also 
depends on several other factors. 










Page Cache Disable 


0 = cacheable page, 1 = non-cacheable. 
Note: 
* The AMD-K5 processor supports global paging only on Models 1, 2, and 3, with a Stepping of 4 or greater. 
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Table 21. Page-Table Entry (PTE) Fields (continued) 





Note 


Virtual-8086 Mode 
Extensions (VME) 


AMD-K5™ Processor 


[ot [Mnemonic] Desipion 


ae 


wk 
oP 


Page Writethrough 


User/Supervisor 0 = user (any CPL), 1 = supervisor (CPL <3). 
Write/Read 0 = read or execute, 1 = write, read, or execute. 


0= not valid, 1 = vali 


* The AMD-K5 processor supports global paging only on Models 1, 2, and 3, with a Stepping of 4 or greater. 











Specifies writeback or writethrough cache protocol for all loca- 
tions in the page mapped by this page-table entry. Whether a 
location is actually cached in a writeback or writethrough state 
also depends on several other factors. 










0 = writeback, 1 = writethrough. 





The Virtual-8086 Mode Extensions (VME) bit in CR4 (bit 0) 
enable performance enhancements for 8086 programs running 
as protected tasks in Virtual-8086 mode. These extensions 
include: 


m Virtualizing maskable external interrupt control and 
notification via the VIF and VIP bits in EFLAGS 


m Selectively intercepting software interrupts (INTn 
instructions) via the Interrupt Redirection Bitmap (IRB) in 
the Task State Segment (TSS) 


Interrupt Redirection in Virtual-8086 Mode Without VME Extensions. 8086 
programs expect to have full access to the interrupt flag (IF) in 
the EFLAGS register, which enables maskable external 
interrupts via the INTR signal. When 8086 programs run in 
Virtual-8086 mode on a 386 or 486 processor, they run as 
protected tasks and access to the IF flag must be controlled by 
the operating system on a task-by-task basis to prevent 
corruption of system resources. 


Without the VME extensions available on the AMD-K5 
processor, the operating system controls Virtual-8086 mode 
access to the IF flag by trapping instructions that can read or 
write this flag. These instructions include STI, CLI, PUSHF, 
POPF, INTn, and IRET. This method prevents changes to the 
real IF when the I/O privilege level (IOPL) in EFLAGS is less 
than 3, the privilege level at which all Virtual-8086 tasks run. 
The operating system maintains an image of the IF flag for each 
Virtual-8086 program by emulating the instructions that read 
or write IF. When an external maskable interrupt occurs, the 
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operating system checks the state of the IF image for the 
current Virtual-8086 program to determine whether the 
program is allowing interrupts. If the program has disabled 
interrupts, the operating system saves the interrupt 
information until the program attempts to re-enable interrupts. 


The overhead for trapping and emulating the instructions that 
enable and disable interrupts and the maintenance of virtual 
interrupt flags for each Virtual-8086 program can degrade the 
processor’s performance. This performance can be regained by 
running Virtual-8086 programs with IOPL set to 3, thus 
allowing changes to the real IF flag from any privilege level, 
but with a loss in protection. 


In addition to the performance overhead caused by 
virtualization of the IF flag in Virtual-8086 mode, software 
interrupts (those caused by INTn instructions that vector 
through interrupt gates) cannot be masked by the IF flag or 
virtual copies of the IF flag. These flags only affect hardware 
interrupts. Software interrupts in Virtual-8086 mode are 
normally directed to the Real mode interrupt vector table 
(IVT), but it may be desirable to redirect interrupts for certain 
vectors to the Protected mode interrupt descriptor table (IDT). 


The processor’s Virtual-8086 mode extensions support both of 
these cases—hardware (external) interrupts and software 
interrupts—with mechanisms that preserve high performance 
without compromising protection. Virtualization of hardware 
interrupts is supported via the Virtual Interrupt Flag (VIF) and 
Virtual Interrupt Pending (VIP) flag in the EFLAGS register. 
Redirection of software interrupts is supported with the 
Interrupt Redirection Bitmap (IRB) in the TSS of each 
Virtual-8086 program. 


Hardware Interrupts and the VIF and VIP Extensions. 

When VME extensions are enabled, the IF-modifying 
instructions that are normally trapped by the operating system 
are allowed to execute, but they write and read the VIF bit 
rather than the IF bit in EFLAGS. This operation leaves 
maskable interrupts enabled for detection by the operating 
system. It also indicates to the operating system whether the 
Virtual-8086 program is able to, or expecting to, receive 
interrupts. 
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When an external interrupt occurs, the processor switches from 
the Virtual-8086 program to the operating system, in the same 
manner as on a 386 or 486 processor. If the operating system 
determines the interrupt is for the Virtual-8086 program, it 
checks the state of the VIF bit in the program’s EFLAGS image 
on the stack. If VIF has been set by the processor (during an 
attempt by the program to set the IF bit), the operating system 
permits access to the appropriate Virtual-8086 handler via the 
interrupt vector table (IVT). If VIF has been cleared, the 
operating system holds the interrupt pending. The operating 
system can do this by saving appropriate information (such as 
the interrupt vector), setting the program's VIP flag in the 
EFLAGS image on the stack, and returning to the interrupted 
program. When the program subsequently attempts to set IF, 
the set VIP flag causes the processor to inhibit the instruction 
and generate a general-protection exception with error code 
zero, thereby notifying the operating system that the program 
is now prepared to accept the interrupt. 


Thus, when VME extensions are enabled, the VIF and VIP bits 
are set and cleared as follows: 


u VIF—This bit is controlled by the processor and used by the 
operating system to determine whether an external 
maskable interrupt should be passed on to the program or 
held pending. VIF is set and cleared for instructions that can 
modify IF, and it is cleared during software interrupts 
through interrupt gates. The original IF value is preserved 
in the EFLAGS image on the stack. 


us VIP—This bit is set and cleared by the operating system via 
the EFLAGS image on the stack. It is set when an interrupt 
occurs for a Virtual-8086 program whose VIF bit is cleared. 
The bit is checked by the processor when the program 
subsequently attempts to set VIF. 


Figure 18 and Table 22 show the VIF and VIP bits in the 
EFLAGS register. The VME extensions support conventional 
emulation methods for passing interrupts to Virtual-8086 
programs, but they make it possible for the operating system to 
avoid time-consuming emulation of most instructions that write 
or read the IF. 


The VIF and IF flags only affect the way the operating system 
deals with hardware interrupts (the INTR signal). Software 
interrupts are handled like machine-generated exceptions and 
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cannot be masked by real or virtual copies of IF (See “Software 
Interrupts and the Interrupt Redirection Bitmap (IRB) 
Extension” on page 75). The VIF and VIP flags only ease the 
software overhead associated with managing interrupts so that 
virtual copies of the IF flag do not have to be maintained by the 
operating system. Instead, each task’s TSS holds its own copy of 
these flags in its EFLAGS image. 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11:10 9 8 7 6 5 43 2 21 «0 
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Symbol Description Bits 
ID ID Flag 21 
VIP Virtual Interrupt Pending 20 
VIF Virtual Interrupt Flag 19 
AC Alignment Check 18 
VM Virtual-8086 Mode 17 
RF Resume Flag 16 
NT Nested Task 14 
lOPL /O Privilege Level 13-12 
OF Overflow Flag 1 
DF Direction Flag 10 





IF Interrupt Flag 9 
TF Trap Flag 8 
SF Sign Flag 7 
ZF Zero Flag 6 
AF Auxiliary Flag 4 
PF Parity Flag 2 
CF Carry Flag 0 


Figure 18. EFLAGS Register 


70 AMD-K5™ Processor 


Preliminary Information AMDZ\ 
21062C/0—March 1997 AMD K86™ Family BIOS and Software Tools Developers Guide 


Table 22. Virtual-Interrupt Additions to EFLAGS Register 


Tot [| Menenie | Descinion 


Set by the operating system (via the EFLAGS image on the stack) 
when an external maskable interrupt (INTR) occurs for a 
Virtual-8086 program whose VIF bit is cleared. The bit is checked 
by the processor when the program subsequently attempts to 


Virtual Interrupt 


set VIF. 


When the VME bit in CR4 is set, the VIF bit is modified by the 
processor when a Virtual-8086 program running at less privilege 

Virtual Interrupt Flag | than the IOPL attempts to modify the IF bit. The VIF bit is used by 
the operating system to determine whether a maskable interrupt 
should be passed on to the program or held pending. 





Tables 23 through 27 show the effects, in various x86-processor 
modes, of instructions that read or write the IF and VIF flag. 
The column headings in this table include the following values: 


PE—Protection Enable bit in CRO (bit 0) 

VM—Virtual-8086 Mode bit in EFLAGS (bit 17) 

VME— Virtual Mode Extensions bit in CR4 (bit 0) 
PVI—Protected-mode Virtual Interrupts bit in CR4 (bit 1) 
IOPL—I/O Privilege Level bits in EFLAGS (bits 13-12) 
Handler CPL—Code Privilege Level of the interrupt handler 
GP(0)—General-protection exception, with error code = 0 
IF—Interrupt Flag bit in EFLAGS (bit 9) 

VIF— Virtual Interrupt Flag bit in EFLAGS (bit 19) 


Table 23. Instructions that Modify the IF or VIF Flags—Real Mode 


_omvee | re | mm | we | mw | tom | cro | ow | ow 
ure eo eee 
a a Aa A a 







“~" Not applicable. 
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Table 24. Instructions that Modify the IF or VIF Flags—Protected Mode 


ove | ome | im | wwe | om | om [Meme] cro | oe | we 

CPL (0) 

qT tf fT = {| of eck | - | No [| Feo [| - 

CC 

m Pw ee eee el Ne et | | 
| Pushed | 
| Pushed 


Pushed 


ae 
oo 
ieee 
| Pushed {Pushed | 
| Pushed | 
aa 
ie 
















—N 


~~ 
Cc 
Sa) 
ale 
“TI 


| Pushed 

[Popped [= 
[Popped [= 
[Not Popped] = 
Notes: 


* — GP(0), if the CPL of the task executing IRETD is greater than the CPL of the task to which it is returning. 
“—" Not applicable. 
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Table 25. Instructions that Modify the IF or VIF Flags—Virtual-8086 Mode 


et Ll eh 

3 [We | Feo] Wo change 
eA OE 
a a a 
a 
ai 
Fished 














— 


ol viw~wi wl ~~ws) Vol UMN 

Ol ORO | ere -e)-el-s 

ola] wml am HAIL ON! HAIN 

aA) TT TM) SZ] zy ci sx 

Oo; o a) —m|/m)}™m 
O| Oo 


Pushed Pushed 


C= 


i oe 


Not Popped 


IRETD? Popped 


Notes: 
J. All Virtual-8086 mode tasks run at CPL = 3. 
2. All protected virtual interrupt handlers run at CPL = 0. 
“—" Not applicable. 
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Table 26. Instructions that Modify the IF or VIF Flags—Virtual-8086 Mode Interrupt 


Extensions (VME)' 

| wee | pe | vm | vm | pu | tom | cho | oe | 

ar tt [= [8 [Wo [0 [ No Change 

er [No Wo change [0 
<3 


STI No Change VIF <— 1 
PUSHF Not Pushed 
PUSHF Pushed into IF 
PUSHFD Pushed 
PUSHFD 
POPF 
POPF 
POPFD 
POPFD 
IRET from 














Not Pushed 


Yes 
Popped Not Popped 
Popped from IF 


Not Popped 


Not Popped 
Popped 


Popped Not Popped 
IRET from 


WG 


Not Popped | Popped from IF 


IRETD from 


Popped Not Popped 


IRETD from 


WG 


IRETD from 
Protected Mode’ 
otes: 
J. All Virtual-8086 mode tasks run at CPL = 3. 
2. All protected virtual interrupt handlers run at CPL = 0. 
3. _GP(0) if an attempt is made to set VIF when VIP = 1. 
“~" Not applicable. 


Popped Popped 


A A A A A A 
fa’ oD 
“ “ 


= <= <= < <= 
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~~ ron) yon) yon) a 
= = = = 
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Table 27. Instructions that Modify the IF or VIF Flags—Protected Mode Virtual 
Interrupt Extensions (PVI)' 


No Change VIF <1 


Not Pushed 
Not Pushed 


| Pushed 
| Pushed 
| Pushed 
| Pushed Pushed 
| Popped 


Popped Not Popped 


Not Popped 
[no | Popped | Not Poppe 
[No | Not Popped | Not Popped 


Popped Popped 


1. Alf Protected mode virtual interrupt tasks run at CPL = 3. 
2. All protected mode virtual interrupt handlers run at CPL = 0. 
3. GP(0) if an attempt is made to set VIF when VIP = 1. 


“—" Not applicable. 
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Software Interrupts and the Interrupt Redirection Bitmap (IRB) Extension. 


In Virtual-8086 mode, software interrupts (INTn exceptions 
that vector through interrupt gates) are trapped by the 
operating system for emulation because they would otherwise 
clear the real IF. When VME extensions are enabled, these 
INTn instructions are allowed to execute normally, vectoring 
directly to a Virtual-8086 service routine via the Virtual-8086 
interrupt vector table (IVT) at address 0 of the task address 
space. However, it may still be desirable for security or 
performance reasons to intercept INTn instructions ona 
vector-specific basis to allow servicing by Protected-mode 
routines accessed through the interrupt descriptor table (IDT). 
This is accomplished by an Interrupt Redirection Bitmap (IRB) 
in the TSS, which is created by the operating system ina 
manner similar to the IO Permission Bitmap (IOPB) in the TSS. 
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Figure 19 shows the format of the TSS with the Interrupt 
Redirection Bitmap near the top. The IRB contains 256 bits, one 
for each possible software-interrupt vector. The 
most-significant bit of the IRB is located immediately below the 
base of the IOPB. This bit controls interrupt vector 255. The 
least-significant bit of the IRB controls interrupt vector 0. 


The bits in the IRB work as follows: 


m Set—lIf set to 1, the INTn instruction behaves as if the VME 
extensions are not enabled. The interrupt vectors to a 
Protected-mode routine if IOPL = 3, or it causes a 
general-protection exception with error code zero if IOPL<3. 


m Cleared—If cleared to 0, the INTn instruction vectors 
directly to the corresponding Virtual-8086 service routine 
via the Virtual-8086 program’s IVT. 


Only software interrupts can be redirected via the IRB toa 
Real mode IVT—hardware interrupts cannot. Hardware 
interrupts are asynchronous events and do not belong to any 


current virtual task. The processor thus has no way of deciding 


which IVT (for which Virtual-8086 program) to direct a 
hardware interrupt to. Hardware interrupts, therefore, always 
require operating system intervention. The VIF and VIP bits 
described in “Hardware Interrupts and the VIF and VIP 
Extensions” on page 68 are provided to assist the operating 
system in this intervention. 
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TSS Limit 
| Being from TR 
/O Permission Bitmap (IOPB) 
(up to 8 Kbytes) 


Interrupt Redirection Bitmap (IRB) 
(eight 32-bit locations) 





Operating System 
Data Structure 


[_esenicesoTore SC C*d 


00h 
ESP2 

0000h SS] 
ESP] 

0000h SSO 


ESPO 
Link (Prior TSS Selector) 





Figure 19. Task State Segment (TSS) 
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Table 28 compares the behavior of hardware and software 
interrupts in various x86-processor operating modes. It also 
shows which interrupt table is accessed: the Protected-mode 
IDT or the Real- and Virtual-8086-mode IVT. The column 
headings in this table include: 

m PE—Protection Enable bit in CRO (bit 0) 

VM—Virtual-8086 Mode bit in EFLAGS (bit 17) 

VME— Virtual Mode Extensions bit in CR4 (bit 0) 
PVI—Protected-Mode Virtual Interrupts bit in CR4 (bit 1) 
IOPL—I/O Privilege Level bits in EFLAGS (bits 13-12) 


IRB—Interrupt Redirection Bit for a task, from the 
Interrupt Redirection Bitmap (IRB) in the tasks TSS 


GP(0)—General-protection exception, with error code = 0 
IDT—Protected-Mode Interrupt Descriptor Table 
IVT—Real- and Virtual-8086 Mode Interrupt Vector Table 


Table 28. Interrupt Behavior and Interrupt-Table Access 


tote | Mim | re | ow | ome | | one | ws |r | or | or 
real mode Lsotware__s | o | Oo | oO | - | oO | - | - | | 
Hardware = | 0 | o | o | - | o | - | - | - | Vv | 
Ridings (comme te OO ee ee 
Hardware | ot | oO | o | - | - | - | | hw | - 
sofware [1 [1 
adware [1 [1 
sofware [1 | 1 
Virtual-8086 
eae Extensions Software | ot | 
mom Fsotware [1 | 1 
pl 
| 0 
| 0 









NIN 










Virtual-8086 
mode* 


ale 
a) 
— 


Se 


(VME)* 

Hardware | 1 
Protected Virtual Software = | 1 
Extensions (PVI) [Hardware | 1 


Notes: 
* All Virtual-8086 tasks run at CPL = 3. 
“~” Not applicable. 


WGN 
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The Protected Virtual Interrupts (PVI) bit in CR4 enables 
support for interrupt virtualization in Protected mode. In this 
virtualization, the processor maintains program-specific VIF 
and VIP flags in a manner similar to those in Virtual-8086 Mode 
Extensions (VME). When a program is executed at CPL = 3, it 
can set and clear its copy of the VIF flag without causing 
general-protection exceptions. 


The only differences between the VME and PVI extensions are 
that, in PVI, selective INTn interception using the Interrupt 
Redirection Bitmap in the TSS does not apply, and only the STI 
and CLI instructions are affected by the extension. 


Tables 23 through 28 show, among other things, the behavior of 
hardware and software interrupts as well as instructions that 
affect interrupts in Protected mode with the PVI extensions 
enabled. 


Model-Specific Registers (MSRs) 


AMD-K5™ Processor 


The processor supports MSRs that can be accessed with the 
RDMSR and WRMSR instructions when CPL = 0. The following 
index values in the ECX register access specific MSRs: 


Machine-Check Address Register (MCAR)—ECX = 00h 
Machine-Check Type Register (MCTR)—ECX = 01h 
Time Stamp Counter (TSC)—ECX = 10h 

Array Access Register (AAR)—ECX = 82h 

Hardware Configuration Register (HWCR)—ECX = 83h 


Write Allocate Top-of-Memory and Control Register 
(WATMCR)—ECX = 85h 


m Write Allocate Programmable Memory Range Register 
(WAPMRR)—ECX = 86h 


Note: The AMD-K5 processor supports write allocate only on 
Models 1, 2, and 3, with a Stepping of 4 or greater. 


The RDMSR and WRMSR instructions are described on page 
90. The following sections describe the format of the registers. 
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Machine-Check The processor latches the address of the current bus cycle in its 
Address Register 64-bit Machine-Check Address Register (MCAR) when a 
(MCAR) bus-cycle error occurs. These errors are indicated either by (a) 
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Physical Address of Last Failed Bus Cycle 


system logic asserting BUSCHK, or (b) the processor asserting 
PCHK while system logic asserts PEN. 


The MCAR can be read with the RDMSR instruction when the 
ECX register contains the value 00h. Figure 20 shows the 
format of the MCAR register. The contents of the register can 
be read with the RDMSR instruction. 


If system software has set the MCE bit in CR4 before the 
bus-cycle error, the processor also generates a machine-check 
exception as described on page 60. 


Figure 20. Machine-Check Address Register (MCAR) 


Machine-Check Type 
Register (MCTR) 
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The processor latches the cycle definition and other 
information about the current bus cycle in its 64-bit 
Machine-Check Type Register (MCAR) at the same times that 
the Machine-Check Address Register (MCAR) latches the cycle 
address—when a bus-cycle error occurs. These errors are 
indicated either by (a) system logic asserting BUSCHK, or (b) 
the processor asserting PCHK while system logic asserts PEN. 


The MCTR can be read with the RDMSR instruction when the 
ECX register contains the value 01h. Figure 21 and Table 29 


_ show the formats of the MCTR register. The contents of the 


register can be read with the RDMSR instruction. The 
processor clears the CHK bit (bit 0) in MCTR when the register 
is read with the RDMSR instruction. 


If system software has set the MCE bit in CR4 before the 
bus-cycle error, the processor also generates a machine-check 
exception as described on page 60. 
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[| —> Reserved 


Symbol Description Bits 
LOCK Locked Cycle 4 
M/lO Memory or I/O Cycle 3 
D/C Data or Code Cycle 2 
W/R Write or Read Cycle 1 
CHK Valid Machine-Check Data 0 


Figure 21. Machine-Check Type Register (MCTR) 


Table 29. Machine-Check Type Register (MCTR) Fields 


Tt _[Mnemonie] —vescipion [Fanon 
4 [Tock [Tocked jee Seo 1 ie processor was asserting LOCK Grngthe bis yae 
3 [WO [emery oF ¥O [T= memory ce, 0= 70 


| 1 | WR | Write or Read 1 = write cycle, 0 = read cycle 
The processor sets the CHK bit to 1 when both the MCTR and MCAR 
registers contain valid information. The processor clears the CHK bit to 


0 when software reads the MCTR with the RDMSR instruction. 





Time Stamp Counter With each processor clock cycle, the processor increments a 

(TSC) 64-bit time stamp counter (TSC) MSR. The counter can be 
written or read using the WRMSR or RDMSR instructions when 
the ECX register contains the value 10h and CPL = 0. The 
counter can also be read using the RDTSC instruction (see 
page 89), but the required privilege level for this instruction is 
determined by the Time Stamp Disable (TSD) bit in CR4. With 
any of these instructions, the EDX and EAX registers hold the 
upper and lower doublewords (dwords) of the 64-bit value to be 
written to or read from the TSC, as 
follows: 


a EDX—Upper 32 bits of TSC 
mw EAX—Lower 32 bits of TSC 


The TSC can be loaded with any arbitrary value. 
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The Array Access Register (AAR) contains pointers for testing 
the tag and data arrays for the instruction cache, data cache, 
4-Kbyte TLB, and 4-Mbyte TLB. The AAR can be written or 
read with the WRMSR or RDMSR instruction when the ECX 
register contains the value 82h. 


For details on the AAR, see “Cache and TLB Testing” on 
page 27. 


The Hardware Configuration Register (HWCR) contains 
configuration bits that control miscellaneous debugging 
functions. The HWCR can be written or read with the WRMSR 
or RDMSR instruction when the ECX register contains the 
value 83h. 


For details on the HWCR, see “Hardware Configuration 
Register (HWCR)” on page 22. 


The AMD-K5 processor supports write allocate only on Models 
1, 2, and 3, with a Stepping of 4 or greater. Use the CPUID 
instruction to determine if the proper revision of the processor 
is present (See the AMD Processor Recognition Application Note, 
order# 20734, located at http://www.amd.com. ). 


Two MSRs are defined to support write allocate. The MSRs are 
accessed using the RDMSR and WRMSR instructions (see 
“RDMSR and WRMSR” of the AMD-K5™ Processor Software 
Development Guide, order# 20007). The following index values 
in the ECX register access the MSRs: 


=» Write Allocate Top-of-Memory and Control Register 
(WATMCR)—ECX = 85h 


= Write Allocate Programmable Memory Range Register 
(WAPMRR)—ECX = 86h 


For more information about write allocate, see the 
Implementation of Write Allocate in the K86™ Processors 
Application Note, order# 21326. 


Three non-write-allocatable memory ranges are defined for use 


with the write allocate feature—one fixed range and two 
programmable ranges. 
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Fixed Range. The fixed memory range is 000A_0000h- 
000F_FFFFh and can be enabled or disabled. When enabled, 
write allocate can not be performed in this range. 


This region of memory, which includes standard VGA and other 
peripheral and BIOS access, is considered non-cacheable. 
Performing a write allocate in this area can cause compatibility 
problems. It is recommended that this bit be enabled (set to 1) 
to prevent write allocate to this range. Set bit 16 of WATMCR 
to enable protection of this range. 


Programmable Range. One programmable memory range is 
xxxx_0000h-yyyy_FFFFh, where xxxx and yyyy are defined 
using bits 15-0 and bits 31-16 of WAPMRR, respectively. Set 
bit 17 of WATMCR to enable protection of this range. When 
enabled, write allocate can not be performed in this range. 


This programmable memory range exists because a small 
number of uncommon memory-mapped I/O adapters are 
mapped to physical RAM locations. If a card like this exists in 
the system configuration, it is recommended that the BIOS 
program the ‘memory hole’ for the adapter into this 
non-write-allocatable range. 


Top of Memory. The other programmable memory range is 
defined by the ‘top-of-memory’ field. The top of memory is 
equal to zzzz_0000h, where zzzz is defined using bits 15-0 of 
WATMCR. Addresses above zzzz_0000h are protected from 
write allocate when bit 18 of WATMCR is enabled. 


Once the BIOS determines the size of RAM installed in the 
system, this size should also be used to program the top of 
memory. For example, a system with 32 Mbytes of RAM 
requires that the top-of-memory field be programmed with a 
value of 0200h, which enables protection from write allocate 
for memory above that value. Set bit 18 of WATMCR to enable 
protection of this range. 


Caching and write allocate are generally not performed for the 
memory above the amount of physical RAM in the system. 
Video frame buffers are usually mapped above physical RAM. 
If write allocate were attempted in that memory area, there 
could be performance degradation or compatibility problems. 
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Bits 18—16 of WATMCR control the enabling or disabling of the 
three memory ranges as follows: 
= Bit 18: Top-of-Memory Enable bit 


0 = disabled (default) 


1 = enabled (write allocate can not be performed above Top 
of Memory) 


m Bit 17: Programmable Range Enable bit 
0 = disabled (default) 


1 = enabled (write allocate can not be performed in this 
range) 


= Bit 16: Fixed Range Enable bit 
0 = disabled (default) 
1 = enabled (write allocate can not be performed in this 
range) 


Figures 22 and 23 show the bit positions for these two new 
registers. 


63 _ 19 18 17 16 15 0 


TIT PIF 
M]R]R Top of Memory—zzzz 
EIEIE P u 








<4 Reserved 
Protection Control Bits 
Top-of-Memory Enable TME 18 


Programmable Range Enable PRE 7 
Fixed Range Enable FRE 16 


Figure 22. Write Allocate Top-of-Memory and Control Register (WATMCR)—MSR 85h 


63 32°43) 16 15 0 
“| Programmable Range—yyyy Programmable Range—x0x 
_ (High - yyyy_FFFFh) (Low - xxxx_0000h) 


Figure 23. Write Allocate Programmable Memory Range Register (WAPMRR)—MSR 86h 





oo] — Be Reserved 
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Enable Write Allocate 


Write allocate is enabled by setting bit 4 (WA) of the HWCR to 
1. For more information on the HWCR, see “Hardware 
Configuration Register (HWCR)” on page 22. Figure 2 on page 
23 shows the revised definition of the Hardware Configuration 
Register. 


New AMD-K5™ Processor Instructions 


AMD-K5™ Processor 


In addition to supporting all the 486 processor instructions, the 
AMD-KS5 processor implements the following instructions: 


CPUID 

CMPXCHG8B 

MOV to and from CR4 

RDTSC 

RDMSR 

WRMSR 

RSM 

legal instruction (reserved opcode) 
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CPUID 

mnemonic opcode description 

CPUID OF A2h Identify processor and its feature set 

Privilege: Any level 

Registers Affected: EAX, EBX, ECX, EDX 

Flags Affected: None 


Exceptions Generated: None 


The CPUID instruction is an application-level instruction that software executes to 
identify the processor and its feature set. This instruction offers multiple functions, 
each providing a different set of information about the processor. The CPUID 
instruction can be executed from any privilege level. Software can use the 
information returned by this instruction to tune its functionality for the specific 
processor and its features. 


Not all processors implement the CPUID instruction. Therefore, software must test to 
determine if the instruction is present on the processor. If the ID bit (21) in the 
EFLAGS register is writeable, the CPUID instruction is implemented. 


The CPUID instruction supports multiple functions. The information associated with 
each function is obtained by executing the CPUID instruction with the function 
number in the EAX register. Functions are divided into two types: standard functions 
and extended functions. Standard functions are found in the low function space, 
0000_0000h-7FFF_FFFFh. In general, all x86 processors have the same standard 
function definitions. 


Extended functions are defined specifically for processors supplied by the vendor 
listed in the vendor identification string. Extended functions are found in the high 
function space, 8000_0000h-8FFF_FFFFh. Because not all vendors have defined 
extended functions, software must test for their presence on the processor. 


For more detailed information refer to the AMD Processor Recognition Application 
Note, order# 20734, located at http://‘www.amd.com. 
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CMPXCHG8B 
mnemonic opcode __ description 


CMPXCHG8B 1/m64 ~—s OF C7h Compare and exchange 8-byte operand 


Privilege: Any level 
Registers Affected: EAX, EBX, ECX, EDX 
Flags Affected: LF 


Exceptions Generated: 
















Virtual 
8086 | Protected | Description 
Invalid opcode (6) X Invalid opcode if destination is a register. 
Stack exception (12) ae ae During instruction execution, the stack segment limit was exceeded 
i X X 


General protection (13) X During instruction execution, the effective address of one of the segment 
registers used for the operand points to an illegal memory location. 
X 


Page fault (14) Le che oa X A page fault resulted from the execution of the instruction 


Alignment check (17) X An unaligned memory reference resulted from the instruction execution, 
and the alignment mask bit (AM) of the control register (CRO) is set to 1. 
(In Protected Mode, CPL = 3.) 
The CMPXCHGS8B instruction is an 8-byte version of the 4-byte CMPXCHG 
instruction supported by the 486 processor. CMPXCHGS8B compares a value from 
memory with a value in the EDX and EAX register, as follows: 





m EDX — Upper 32 bits of compare value 
mw EAX— Lower 32 bits of compare value 


If the memory value matches the value in EDX and EAX, the ZF flag is set to 1 and the 
8-byte value in ECX and EBX is written to the memory location, as follows: 


m= ECX — Upper 32 bits of exchange value 
m EBX — Lower 32 bits of exchange value 
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MOV to and from CR4 

mnemonic opcode ___ description 

MOV CR4,r32 OF 22h Move to CR4 from register 

MOV r32,CR4 OF 20h = Move to register from CR4 

Privilege: CPL=0 

Registers Affected: CR4, 32-bit general-purpose register 

Flags Affected: OF, SF, ZF, AF, PF, and CF are undefined 

Exceptions Generated: 


Virtual 
8086 | Protected | Description 


Es ee a If 1 is written to any reserved bits. 
General protection (13) se eee Executing this instruction in Virtual 8086 mode. 
fx fier neta 


These instructions read and write control register 4 (CR4). 
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RDTSC 

mnemonic opcode ___ description 

RDTSC OF 31h Read time stamp counter 

Privilege: Selectable by TSD bit in CR4 

Registers Affected: EAX, EDX 

Flags Affected: none 


Exceptions Generated: 


Virtual 
8086 | Protected | Description 


X Executing this instruction in Virtual 8086 mode. 
General protection (13) Ae 2s ot ae : 
a ae a ae If CPL not = 0 when TSD bit of CR4 = 1. 


The AMD-K5 processor’s 64-bit time stamp counter (TSC) increments on each 
processor clock. In Real or Protected mode, the counter can be read with the RDMSR 
instruction and written with the WRMSR instruction when CPL = 0. However, in 
Protected mode, the RDTSC instruction can be used to read the counter at privilege 
levels higher than CPL = 0. 





The required privilege level for using the RDTSC instruction is determined by the 
Time Stamp Disable (TSD) bit in CR4, as follows: 

m CPL =0— Set the TSD bit in CR4 tol 

m Any CPL — Clear the TSD bit in CR4 to 0 


The RDTSC instruction reads the counter value into the EDX and EAX registers as 
follows: 

m= EDX — Upper 32 bits of TSC 

m EAX— Lower 32 bits of TSC 


The following example shows how the RDTSC instruction can be used. After this code 
is executed, EAX and EDX contain the time required to execute the RDTSC 
instruction. 


mov ecx,10h ;Time Stamp Counter Access via MSRs 

mov eax,00000000h piNICialize- the. eax part. of ‘the Counter to zero 
mOvV edx ,00000000h ;Initialize the edx part of the Counter to zero 
db OFh, 30h ;WRMSR 

db OFh, 31h ;RDTSC 


db OFh, 31h ;RDTSC 
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RDMSR and WRMSR 

mnemonic opcode ___ description 

RDMSR OF 32h Read model-specific register (MSR) 

WRMSR OF 30h Write model-specific register (MSR) 

Privilege: CPL=0 | 

Registers Affected: EAX, ECX, EDX 

Flags Affected: none 


Exceptions Generated: 


Virtual 
8086 | Protected | Description | 


2 ae eae For unimplemented MSR address. 
General protection (13) ae ee Executing this instruction in Virtual 8086 mode. 
Poff KL For unimplemented MSR address OR if CPL not = 0. | 


The RDMSR or WRMSR instructions can be used in Real or Protected mode to access 
several 64-bit MSRs. These registers are addressed by the value in ECX, as follows: 


m OOh: Machine-Check Address Register (MCAR). This may contain the physical 
address of the last bus cycle for which the BUSCHK or PCHK signal was asserted. 
For details, see “Machine-Check Address Register (MCAR)” on page 80. 


m Oh: Machine-Check Type Register (MCTR). This contains the cycle definition of 
the last bus cycle for which the BUSCHK or PCHK signal was asserted. For details, 
see “Machine-Check Type Register (MCTR)” on page 80. The processor clears the 
CHK bit (bit 0) in MCTR when the register is read with the RDMSR instruction. 

m 10h: Time Stamp Counter (TSC). This contains a time value. The TSC can be 
initialized to any value with the WRMSR instruction, and it can be read with either 
the RDMSR or RDTSC instruction. For details, see “Time Stamp Counter (TSC)” 
on page 81. 






m 82h: Array Access Register (AAR). This contains an array pointer and test data for 
testing the processor’s cache and TLB arrays. For details on the AAR, see “Cache 
and TLB Testing” on page 27. 


m §&3h: Hardware Configuration Register (HWCR). This contains configuration bits 
that control miscellaneous debugging functions. For details, see “Hardware 
Configuration Register (HWCR)” on page 22. 


85h: Write Allocate Top-of-Memory and Control Register (WATMCR) 
86h: Write Allocate Programmable Memory Range Register (WAPMRR) 


Note: The AMD-K5 processor supports write allocate only on Models 1, 2, and 3, 
with a Stepping of 4 or greater. 
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The above values in ECX identify the register to be read or written. The EDX and 
EAX registers contain the MSR values to be read or written, as follows: 


» EDX—Upper 32 bits of MSR. For the AAR, this contains the array pointer and (in 
contrast to all other MSRs) its contents are not altered by a RDMSR instruction. 
m EAX—Lower 32 bits of MSR. For the AAR, this contains the data to be read/written. 


All MSRs are 64 bits wide. However, the upper 32 bits of the AAR are write-only and 
are not returned on a read. EDX remains unaltered, making it more convenient to 
maintain the array pointer. 


If an attempt is made to execute either the RDMSR or WRMSR instruction when CPL 
is greater than 0, or to access an undefined MSR, the processor generates a 
general-protection exception with error code zero. 


Model-Specific Registers, as their name implies, may or may not be implemented by 
later models of the AMD-K5 processor. 
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RSM 

mnemonic opcode ___ description 

RSM OF AAh Resume execution (exit System Management Mode) 

Privilege: CPL =0 


Registers Affected: CS, DS, ES, FS, GS, SS, EIP, EFLAGS, LDTR, 
CR3, EAX, EBX, ECX, EDX, ESP, EBP, EDI, ES! 

Flags Affected: none 

Exceptions Generated: 


Virtual 
8086 | Protected | Description 





Invalid opcode (6) Invalid opcode if not in SMM Mode. 


The RSM instruction should be the last instruction in any System Management Mode 
(SMM) service routine. It restores the processor state that was saved when the SMI 
interrupt was asserted. This instruction is only valid when the BrOreS 20" is in SMM. It 
generates an invalid opcode exception at all other times. 


The processor enters the Shutdown state if any of the following illegal conditions are 
encountered during the execution of the RSM instruction: 

m the SMM base value is not aligned on a 32-Kbyte boundary 

m Any reserved bit of CR4 is set to1 

m The PG bit is set while the PE is cleared in CRO 

ms The NW bit is set while the CD bit is cleared in CRO 
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illegal Instruction (Reserved Opcode) 


mnemonic opcode __ description 

(none) OF FFh Illegal instruction (reserved opcode) 
Privilege: Any level 

Registers Affected: none 

Flags Affected: none 


Exceptions Generated: 


Virtual 
8086 | Protected | Description 


Invalid opcode (6) Invalid opcode if executed. 


This opcode always generates an invalid opcode exception. The opcode will not be 
used in future AMD K86 processors. 
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AMD-K6é™ 
MMX Processor 


The following sections describe additional information 
required by BIOS developers to properly incorporate the 
AMD-K6 MMX processor into a system. The BIOS for the 
AMD-K6 needs minimal changes in order to fully support the 
AMD-K6 MMX processor family. 


BIOS Consideration Checklist 


CPUID 


AMD-K6 MMX Processor 


m Use the CPUID instruction to properly identify the AMD-K6 
processor. 


m Determine the processor type, stepping and features using 
functions 0000_0001h and 8000_0001h of the CPUID 
instruction. 


= Boot-up display: The processor name should be displayed as 
‘AMD-K6/PR2-XXX’. See “CPU Identification Algorithms” 
on page 3 for more information. 
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CPU Speed Detection 


Use speed detection algorithms that do not rely on 
repetitive instruction sequences. 


Use the Time Stamp Counter (TSC) to ‘clock’ a timed 
operation and compare the result to the Real Time Clock 
(RTC) to determine the operating frequency. See the 
example of frequency-determination assembler code 
available on the AMD website at http://www.amd.com. 


Display the P-Rating shown in Table 2, “Summary of 
AMD-K6™ MM<X Processor CPU IDs and BIOS Boot Strings,” 
on page 4. 


Model-Specific Registers (MSRs) 


Cache Testing 
SMM Issues 
| 
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Only access MSRs implemented in the AMD-K6 processor. 


Enable Write Allocation by programming the Write 
Handling Control Register (WHCR). See “Write Handling 
Control Register (WHCR)” on _ page1l19 and the 
Implementation of Write Allocate in the K86™ Processors 
Application Note, order# 21326 for more information. 


Use the AMD-K6 processor’s BIST function to test internal 
memories. See “Built-In Self-Test (BIST)” on page 106 for 
more information. The AMD-K6 does not contain MSRs to 
allow for cache testing. 


The System Management Mode (SMM) functionality of the 
AMD-K6 processor is identical to Pentium. 

Implement the AMD-K6 processor SMM state-save area in 
the same manner as Pentium except for the IDT Base and 
possibly Pentium-reserved areas. See “AMD-K6™ MMX 
Processor System Management Mode” on page 97 for more 
information. 
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AMD-K6™ MMX Processor System Management Mode 


The System Management Mode (SMM) in the AMD-K6 MMX 
processor is similar to the AMD-K5 processor. This section 
points out the differences. See “AMD-K5™ Processor System 
Management Mode (SMM)” on page 7 for details on the 


AMD.-KS processor implementation of SMM. 


Initial Register Values 


The general purpose registers and DR6 are unmodified when 
entering SMM. Table 30 shows the default register values when 


entering SMM. 





















[seer [ae 
x Gbytes 
4 Gbytes 
PS 0000-80008 
GDTR Unmodified 
Unmodified . 


Table 30. Initial State of Registers in SMM 
[—$o00h [00050000 [4 Gy 
a Gbytes 
General-Purpose Registers Unmodified 
CRO Bits 0, 2, 3, and 31 cleared (PE, EM, TS, and PG); remainder are unmodified. 
LDTR Unmodified 







Initial Contents 
Register 
cs 
coo 4 Gbytes 
4 che 
FLAGS 
IDTR Unmodified 
Unmodified 
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SMM State-Save Area 


When the SMI is recognized the AMD-K6 MMX processor saves 
its state to the state-save area shown in Table 31. If the SMI has 
been relocated, the state dump begins at CS Base + 7FFFh 
(8000 + 7FFFh). The default CS Base is 30000h. 


Table 31. AMD-K6™ MMX Processor State-Save Map 
EFLAGS 


RO 

R3 
ESI 
ESP 
EDX 
ECX 
EAX 
DR7 

R 
Gs 
FS 
DS 
S 
C 
ES 


Address Offset 
FFFCh 
FFF8h 
FFF4h 
FFFOh 
FFECh 
FFE8h 
FFE4h 
FFEOh 
FFDCh 
FFD8h 
FFD4h 
FFDOh 
FFCCh 
FFC8h 
FFC4h 
FFCOh 
FFBCh 
FFB8h 
FFB4h 
FFBOh 


(=) 
Pr") 
ca) 












(oa) 
P=) 
WG 


m 
a) — 





m 
a) 


O| | a 
x< 


— 


Oo 
pe] 


“Alo = mim 
ALA —~ 7) 2 =| a/ 9 


T 


LDTR Base | LDTR Base 
FFACh [je 
Fah Ss 


FFA4h |/O Trap Dword I/O Trap Dword 
FAOh = 


FF9Ch /O Trap EIP* — 
Notes: 
— No dump at that address. | 
* — Only contains information if SMI was asserted on a valid corresponding //O. 


wlio 


2) 
—” 
—” 


| 
~” 


[/O Trap EIP * 
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Table 31. AMD-K6™ MMX Processor State-Save Map (continued) 


Address oftset | AMD-KS | AMD-KG 
i eae ci (I 






FF94h 


i 
Fach 
Faoh 
roan 
Fah 
FTCh 
FrBh 
Fah wre 
FON 
ech 
Fésh 
Fah 
ron 
sch 
Fsoh 
Fah 
Fon 
cet 
ofes: 


-- No dump at that address. 
* — Only contains information if SMI was asserted on a valid corresponding I/O. 










































=|) a] al al al a] al a] se] al a] 
NIT NE BT ROT | Ge] Ga] Gm Bl] Pi SP! Pp 
Oo] eS] OO] OO] P|] COIa! CO] Pl aeolian 
pam a Oe ee ae ee ee ee ee 








99 


AMD¢1 Preliminary Information 
AMD K86™ Family BIOS and Software Tools Developers Guide 21062C/0—March 1997 


Table 31. AMD-K6™ MMX Processor State-Save Map (continued) 





Address Offset 


| AMD-Ks | AMD KG 
rich = 
Fish = 
Fah 
FIOh 
FFOCh /O restart ESI* 

FFO8h [/O restart ECX* I/O restart ECX* 

FFO4h [/O restart EDI* 

oak 
FFOOh I/O Restart Slot [/O Restart Slot 
FFCh 
rrah 


Notes: 






















No dump at that address. 
* — Only contains information if SMI was asserted on a valid corresponding I/O. 


SMM Revision Identifier 


The SMM Revision Identifier specifies the version of SMM and 
the extensions available on the processor. Table 32 defines the 
bits associated with this register. A 1 present in either the I/O 
Trap Extension or the SMM Base Relocation indicates this 
feature is available for use. 


Table 32. SMM Revision Identifier 


poste | tw | 
SMM Base Relocation | I/O Trap Extension | SMM Revision Level 
A 











SMM Base Address 


This feature is compatible with the AMD-K5 processor and 
Pentium. See “SMM Base Address” on page 12. 
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I/O Trap Restart 
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This feature is compatible with the AMD-K5 processor and 
Pentium. See “Auto Halt Restart Slot” on page 13. 


If the assertion of SMI is recognized on the boundary of an I/O 
bus cycle, the I/O trap doubleword at offset FFA4h in the SMM 
state-save area contains information about the associated I/O 
instruction. The AMD-K6 MMX processor provides additional 
information at this offset when compared to the AMD-K5 
processor. The AMD-K6 processor provides a bit to determine if 
the I/O string operand is a REP string operation. The fields of 
the /O Trap Dword are configured as shown in Table 33. 


Table 33. AMD-K6™ MMX Processor 1/0 Trap Dword Configuration 


I/O Port pacanuea Rep String |/O String Valid 1/O Input or 
Address Operation Operation Instruction Output 





This feature is compatible with the AMD-K5 processor. See “I/O 
Trap Restart Slot” on page 14. 


Exceptions and Interrupts Within SMM 


AMD-K6 MMxX Processor 


This feature is compatible with the AMD-K5. See “Exceptions 
and Interrupts in SMM” on page 16. 
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AMD-K6™ MMX Processor Reset State 


Table 34 shows the state of all architecture registers and MSRs 
after the processor has completed its initialization resulting 
from the recognition of the assertion of RESET. 


Table 34. State of the AMD-K6™ MMX Processor After RESET 


| Register| CRESETState = |_Notes_ 
Teor ——_—_—_—_—(ase000_ ooo imikarFFR 
ork | aseo000_ooootitarrFrh <P 
a 
eG 
ew 
z 

















a 
acne 
aan 
TT 
onan 
a 
: ooo ooooh Sid 
foooo_ooooh SY 
foooo-coooh 
ep feooo-ooooh iY 
Fook SSC~sSSCSCS 
a 
oooh 
a 
a 
a 
PPUSiockR7-RO | o000_O000-0000-0000_O000R | 
FPUConolWord [oooh 
FPUSiausWord —[ooooh ———SSSC~dCSSC 


._ The contents of EAX indicate if BIST was successful. If EAX = 0000_0000h, then BIST 
was successful. If EAX is non-zero, BIST failed. 

2. EDX contains the AMD-K6 MMX processor signature. 

3. These Model-Specific Registers are described in “AMD-K6™ MMX Processor x86 

Architecture Extensions” on page 117. 


“T1) m ~*” 
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Table 34. State of the AMD-K6™ MMX Processor After RESET (continued) 









FP TgWord 


PU Instruction Pointer | 0000_0000_0000h 


DR7 





| Notes 
a 
—— 
— 
a 
od 
a 

ee 
fom ————fonno owes 
ps «*PR OH 

feooo coco SY 
fom ———=on00-ooooh 
fo ———feo0o-ooooh 
foro —————~fe000-ooooh 
[McAR [000-0000 _o000-o000m | 
[mcr o00-o000-o000-o000m =f 
rieiz [000-0000 0000-0000 | 
fs __eo00-o000-o000-o000h | 


The contents of EAX indicate if BIST was successful. If EAX = 0000_0000h, then BIST 
was successful. If EAX is non-zero, BIST failed. 
2. EDX contains the AMD-K6 MMX processor signature. 


3. These Model-Specific Registers are described in “AMD-K6™ MMX Processor x86 
Architecture Extensions” on page 117. 








Segment Register Attributes 


AMD-K6 MMKX Processor 





See Table 10 on page 20 for segment register attribute initial 


values. 
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State of the AMD-K6™ MMX Processor After INIT 


The assertion of INIT causes the processor to empty its 
pipelines, initialize most of its internal state, and branch to 
address FFFF_FFFOh—the same instruction execution starting 
point used after RESET. Unlike RESET, the processor 
preserves the contents of its caches, the floating-point state, the 
SMM base, the MMX state, MSRs, and the CD and NW bits of 
the CRO register. 


The edge-sensitive interrupts FLUSH and SMI are sampled and 
preserved during the INIT process and are handled accordingly 
after the initialization is complete. However, the processor 
resets any pending NMI interrupt upon sampling INIT asserted. 


INIT can be used as an accelerator for 80286 code that requires 
a reset to exit from Protected mode back to Real mode. 


AMD-K6™ MMX Processor Cache 


The internal L1 cache of the AMD-K6 MMX processor consists 
of two separate caches—a 32-Kbyte instruction cache and a 
32-Kbyte data cache. The instruction cache also incorporates a 
20-Kbyte pre-decode cache in addition to a 64-entry TLB. The 
data cache utilizes a 128-entry TLB. The cache line is 32 bytes 
wide. Two adjacent cache lines are associated with each tag (a 
64-byte sector with two 32-byte cache lines). 


The AMD-K5 processor uses the Array Access Register (AAR), 
a MSR that allows for testing of the processor caches. The 
AMD-K6 processor does not contain these features. The 
AMD-K6 contains a built-in self-test (BIST) for all internal 
memories. However, cache information can be provided by 
utilizing the CPUID instruction. For more detailed information 
refer to the AMD Processor Recognition Application Note, order# 
20734, located at http://www.amd.com. 


Function 8000_0005h of the CPUID instruction returns 
processor cache information. Table 35 shows the information 
returned by the CPUID instruction when EAX = 8000_0005h. 
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Table 35. Data Returned by the CPUID Instruction 


ECX 
L1 data cache—Lines per tag 
L1 data cache—Line size (bytes) 















Note: 
Full associativity is indicated by a value of FFh. 





AMD-K6™ MMX Processor Test and Debug 


AMD-K6 MMX Processor 


The AMD-K6 MMX processor implements various test and 
debug modes to enable the functional and manufacturing 
testing of systems and boards that use the processor. In 
addition, the debug features of the processor allow designers to 
debug the instruction execution of software components. This 
section describes the following test and debug features: 


m Built-In Self-Test (BIST)—The BIST, which is invoked after 
the falling transition of RESET, runs internal tests that 
exercise most on-chip RAM and ROM structures. 

m Tri-State Test Mode—A test mode that causes the processor 
to float its output and bidirectional pins. 

= Boundary-Scan Test Access Port (TAP)—The Joint Test Action 
Group (JTAG) test access function defined by the IEEE 
Standard Test Access Port and Boundary-Scan Architecture 
(IEEE 1149.1-1990) specification. 
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m Level-One (L1) Cache Inhibit—A feature that disables the 
processor’s internal L1 instruction and data caches. 


= Debug Support—Consists of all x86-compatible software 
debug features, including the debug extensions. 


Built-In Self-Test (BIST) 


Following the falling transition of RESET, the processor 
unconditionally runs its BIST. The internal resources tested 
during BIST include the following: 


= L1 instruction and data caches 

= Instruction and Data Translation Lookaside Buffers (TLBs) 
= Microcode Read-Only Memory (ROM) 

= Programmable Logic Arrays 


The contents of the EAX general-purpose register after the 
completion of RESET indicate if the BIST was successful. If 
EAX contains 0000_0000h, then BIST was successful. If EAX is 
non-zero, the BIST failed. Following the completion of the BIST, 
the processor jumps to address FFFF_FFF0Oh to start 
instruction execution, regardless of the outcome of the BIST. 


The BIST takes approximately 295,000 processor clocks to 
complete. 


Tri-State Test Mode 


The Tri-State Test mode causes the processor to float its output 
and bidirectional pins, which is useful for board-level 
manufacturing testing. In this mode, the processor is 
electrically isolated from other components on a system board, 
allowing automated test equipment (ATE) to test those 
components that drive the same signals as those the processor 
floats. 


If the FLUSH signal is sampled Low during the falling 
transition of RESET, the processor enters the Tri-State Test 
mode. See the AMD-K6 MMX Processor Data Sheet, order# 
20695, for more information. 
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Boundary-Scan Test Access Port (TAP) 


TAP Registers 


AMD-K6 MMKX Processor 


The boundary-scan Test Access Port (TAP) is an IEEE standard 
that defines synchronous scanning test methods for complex logic 
circuits, such as boards containing a processor. The AMD-K6 
MMX processor supports the TAP standard defined in the IEEE 
Standard Test Access Port and Boundary-Scan Architecture (IEEE 
1149.1-1990) specification. 


Boundary scan testing uses a shift register consisting of the 
serial interconnection of boundary-scan cells that correspond 
to each I/O buffer of the processor. This non-inverting register 
chain, called a Boundary Scan Register (BSR), is used to 
capture the state of every processor pin and to drive every 
processor output and bidirectional pin to a known state. 


Each BSR of every component on a board that implements the 
boundary-scan architecture can be serially interconnected to 
enable component interconnect testing. 


The AMD-K6 processor provides an Instruction Register (IR) and 
three Test Data Registers (TDR) to support the boundary-scan 
architecture. The IR and one of the TDRs—the Boundary-Scan 
Register (BSR)—consist of a shift register and an output register. 
The shift register is loaded in parallelin the Capture states (See 
the IEEE Standard Test Access Portand Boundary-Scan Architecture 
(IEEE 1149.1-1990) specification for more information). In 
addition, the shift register is loaded and shifted serially in the 
Shift states. The output register is loaded in parallel from its 
corresponding shift register in the Update states. 


Instruction Register (IR). The IR is a 5-bit register, without parity, 
that determines which instruction to run and which test data 
register to select. When the TAP controller enters the 
Capture-IR state, the processor loads the following bits into the 
IR shift register: 


= 01b—Loaded into the two least significant bits, as specified 
by the IEEE 1149.1 standard 


ms 000b—Loaded into the three most significant bits 


Loading 00001b into the IR shift register during the Capture-IR 
state results in loading the SAMPLE/PRELOAD instruction. 


For each entry into the Shift-IR state, the IR shift register is 
serially shifted by one bit toward the TDO pin. During the shift, 
the most significant bit of the IR shift register is loaded from 
the TDI pin. 
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The IR output register is loaded from the IR shift register in the 
Update-IR state, and the current instruction is defined by the 
IR output register. See “TAP Instructions” on page 111 fora 
list and definition of the instructions supported by the 
AMD-K6. 


Boundary Scan Register (BSR). The BSR is a Test Data Register 
consisting of the interconnection of 152 boundary-scan cells. 
Each output and bidirectional pin of the processor requires a 
two-bit cell, where one bit corresponds to the pin and the other 
bit is the output enable for the pin. When a 0 is shifted into the 
enable bit of a cell, the corresponding pin is floated, and when a 
1 is shifted into the enable bit, the pin is driven valid. Each 
input pin requires a one-bit cell that corresponds to the pin. 
The last cell of the BSR is reserved and does not correspond to 
any processor pin. 


The total number of bits that comprise the BSR is 281. Table 36 
on page 109 lists the order of these bits, where TDI is the input 
to bit 280, and TDO is driven from the output of bit 0. The 
entries listed as pin_E (where pin is an output or bidirectional 
signal) are the enable bits. 


If the BSR is the register selected by the current instruction 
and the TAP controller is in the Capture-DR state, the 
processor loads the BSR shift register as follows: 


m If the current instruction is SAMPLE/PRELOAD, then the 
current state of each input, output, and bidirectional pin is 
loaded. A bidirectional pin is treated as an output if its 
enable bit equals 1, and it is treated as an input if its enable 
bit equals 0. 


m If the current instruction is EXTEST, then the current state 
of each input pin is loaded. A bidirectional pin is treated as 
an input, regardless of the state of its enable. 


While in the Shift-DR state, the BSR shift register is serially 
shifted toward the TDO pin. During the shift, bit 280 of the BSR 
is loaded from the TDI pin. 


The BSR output register is loaded with the contents of the BSR 
shift register in the Update-DR state. If the current instruction 
is EXTEST, the processor’s output pins, as well as those 
bidirectional pins that are enabled as outputs, are driven with 
their corresponding values from the BSR output register. 
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Table 36. Boundary Scan Register Bit Definitions 
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Table 36. Boundary Scan Register Bit Definitions (continued) 
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Device Identification Register (DIR). The DIR is a 32-bit Test Data 
Register selected during the execution of the IDCODE 
instruction. The fields of the DIR and their values are shown in 
Table 37 and are defined as follows: 


m Version Code—This 4-bit field is incremented by AMD 
manufacturing for each major revision of silicon. 


ms Part Number—This 16-bit field identifies the specific 
processor model. 

m Manufacturer—This 11-bit field identifies the manufacturer 
of the component (AMD). 


=» LSB—The least significant bit (LSB) of the DIR is always set 
to 1, as specified by the IEEE 1149.1 standard. 






Table 37. AMD-K6™ MMX Processor Device Identification Register 
Version Code 


Part Number Manufacturer LSB 
(Bits 31-28) (Bits 27-12) (Bits 11-1) (Bit 0) 
[eh _e560h | coo000000015 [1b 


Bypass Register (BR). The BR is a Test Data Register consisting of 
a 1-bit shift register that provides the shortest path between 
TDI and TDO. When the processor is not involved in a test 
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operation, the BR can be selected by an instruction to allow the 
transfer of test data through the processor without having to 
serially scan the test data through the BSR. This functionality 
preserves the state of the BSR and significantly reduces test 
time. 


The BR register is selected by the BYPASS and HIGHZ 
instructions as well as by any instructions not supported by the 
AMD-K6. 


TAP Instructions The processor supports the three instructions required by the 
IEEE 1149.1 standard—EXTEST, SAMPLE/PRELOAD, and 
BYPASS—as well as two additional optional instructions— 
IDCODE and HIGHZ. 


Table 38 shows the complete set of TAP instructions supported 
by the processor along with the 5-bit Instruction Register 
encoding and the register selected by each instruction. 


Table 38. Supported TAP Instructions 


[eatrction [coding | Regiter | —Cesigton 
EXTEST' Sample inputs and drive outputs 
SAMPLE / PRELOAD Sample inputs and outputs, then load the BSR 









IDCODE o0010b =| ~=—SséDIR'-—_—_—| Read DIR 
HIGHZ | 00011bess sf BR Float outputs and bidirectional pins 







BYPASS” Undefined instruction, execute the BYPASS instruction 
BYPASS? pb Connect TDI to TDO to bypass the BSR 


. Following the execution of the EXTEST instruction, the processor must be reset in order to return to normal, non-test operation. 
2. These instruction encodings are undefined on the AMD-K6 MMKX processor and default to the BYPASS instruction. 


3. Because the TDI input contains an internal pullup, the BYPASS instruction is executed if the TDI input is not connected or open 
during an instruction scan operation. The BYPASS instruction does not affect the normal operational state of the processor. 







EXTEST. When the EXTEST instruction is executed, the 
processor loads the BSR shift register with the current state of 
the input and bidirectional pins in the Capture-DR state and 
drives the output and bidirectional pins with the corresponding 
values from the BSR output register in the Update-DR state. 


AMD-K6 MMxX Processor | 111 


AMD¢l 


Preliminary Information 


AMD K86™ Family BIOS and Software Tools Developers Guide 21062C/0—March 1997 


L1 Cache Inhibit 


Purpose 
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SAMPLE/PRELOAD. The SAMPLE/PRELOAD instruction performs 
two functions. These functions are as follows: 


= During the Capture-DR state, the processor loads the BSR 
shift register with the current state of every input, output, 
and bidirectional pin. 


ms During the Update-DR state, the BSR output register is 
loaded from the BSR shift register in preparation for the 
next EXTEST instruction. 


The SAMPLE/PRELOAD instruction does not affect the normal 
operational state of the processor. 


BYPASS. The BYPASS instruction selects the BR register, which 
reduces the boundary-scan length through the processor from 
281 to one (TDI to BR to TDO). The BYPASS instruction does 
not affect the normal operational state of the processor. 


IDCODE. The IDCODE instruction selects the DIR register, 
allowing the device identification code to be shifted out of the 
processor. This instruction is loaded into the IR when the TAP 
controller is reset. The IDCODE instruction does not affect the 
normal operational state of the processor. 


HIGHZ. The HIGHZ instruction forces all output and 
bidirectional pins to be floated. During this instruction, the BR 
is selected and the normal operational state of the processor is 
not affected. 


The AMD-K6 MMX processor provides a means for inhibiting 
the normal operation of its L1 instruction and data caches while 
still supporting an external Level-2 (L2) cache. This capability 
allows ‘system designers to disable the L1 cache during the 
testing and debug of an L2 cache. 


If the Cache Inhibit bit (bit 3) of Test Register 12 (TR12) is set 
to 0, the processor’s L1 cache is enabled and operates as 
described in the Cache Organization section of the AMD-K6 
MMX Processor Data Sheet, order# 20695. If the Cache Inhibit 
bit is set to 1, the L1 cache is disabled and no new cache lines 
are allocated. Even though new allocations do not occur, valid 
Li cache lines remain valid and are read by the processor when 


AMD-K6 MMX Processor 


21062C/0—March 1997 


Debug 


Debug Registers 


AMD-K6 MMX Processor 


Preliminary Information AMDZI 
AMD K86™ Family BIOS and Software Tools Developers Guide 


a requested address hits a cache line. In addition, the processor 
continues to support inquire cycles initiated by the system 
logic, including the execution of writeback cycles when a 
modified cache line is hit. 


While the L1 is inhibited, the processor continues to drive the 
PCD output signal appropriately, which system logic can use to 
control external L2 caching. 


In order to completely disable the L1 cache so no valid lines 
exist in the cache, the Cache Inhibit bit must be set to 1 and the 
cache must be flushed in one of the following ways: 


By asserting the FLUSH input signal 
By executing the WBINVD instruction 


By executing the INVD instruction (modified cache lines are 
not written back to memory) 


The AMD-K6 processor implements the standard x86 debug 
functions, registers, and exceptions. In addition, the processor 
supports the I/O breakpoint debug extension. The debug 
feature assists programmers and system designers during 
software execution tracing by generating exceptions when one 
or more events occur during processor execution. The 
exception handler, or debugger, can be written to perform 
various tasks, such as displaying the conditions that caused the 
breakpoint to occur, displaying and modifying register or 
memory contents, or single-stepping through program 
execution. 


The following sections describe the debug registers and the 
various types of breakpoints and exceptions supported by the 
processor. 


For more details on the register definitions see the Test and 
Debug chapter in the AMD-K6 MMX Processor Data Sheet, 
order# 20695. 


Figures 24 through 27 show the 32-bit debug registers 


supported by the processor. Table 39 provides LEN and RW 
information for DR7 as displayed in Figure 24. 
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LEN 3 Length of Breakpoint #3 31-30 
R/W 3 Type of Transaction(s) to Trap 29-28 
LEN 2 Length of Breakpoint #2 27-26 
R/W 2 Type of Transaction(s) to Trap 25-24 
LEN 1 Length of Breakpoint #1 23-22 
RW 1 Type of Transaction(s) to Trap 21-20 
LEN 0 Length of Breakpoint #0 19-18 
z= RW 0 Type of Transaction(s) to Trap 17-16 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 1615 14 13 12:11:10 9 8 7 6 5 43 2 «1 «0 


LEN | RW] LEN | RW | LEN | RW| LEN 
3 Bi Oe Ms 0 


_] > Reserved 
Symbol Description i 
GD General Detect Enabled 
GE Global Exact Breakpoint Enabled 
LE Local Exact Breakpoint Enabled 
G3 Global Exact Breakpoint # 3 Enabled 
13 Local Exact Breakpoint # 3 Enabled 
G2 Global Exact Breakpoint # 2 Enabled 
L2 Local Exact Breakpoint # 2 Enabled 
Gl Global Exact Breakpoint # 1 Enabled 
LI Local Exact Breakpoint # 1 Enabled 
GO Global Exact Breakpoint # 0 Enabled 
LO Local Exact Breakpoint # 0 Enabled 


Figure 24. Debug Register DR7 





O-NWEAUDIDMO T|G 


Table 39. DR7 LEN and RW Definitions 


Four-byte I/O Read or Write 
One-byte Data Read or Write 
Two-byte Data Read or Write 


Four-byte Data Read or Write 


J. LEN bits equal to 10b is undefined. 
2. When RW equals 00b, LEN must be equal to OOb. 
3. When RW equals 10b, debugging extensions (DE) must be enabled (bit 3 of CR4 must be set to 1). If DE is set to 0, RW equal to 10b is undefined. 
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—+ Reserved 


Symbol Description Bit 
BT Breakpoint Task Switch 15 
BS Breakpoint Single Step 14 
BD Breakpoint Debug Access Detected 13 
B3 Breakpoint #3 Condition Detected 3 
B2 Breakpoint #2 Condition Detected 2 
BI Breakpoint #1 Condition Detected 1 
BO Breakpoint #0 Condition Detected 0 


Figure 25. Debug Register DR6 


DR5 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12:11 10 9 8 7 6 5 43 2 «21 «0 





DR4 





Figure 26. Debug Registers DR5 and DR4 
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DR3 
1 0 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 


Breakpoint 3 32-bit Linear Address 





DR2 
1 0 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12: 11:10 9 8 7 6 5 4 3 2 


Breakpoint 2 32-bit Linear Address 





DRI 
1 0 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 43 2 


Breakpoint 1 32-bit Linear Address 





DRO 
1 0 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 4 3 2 


Breakpoint 0 32-bit Linear Address 





Figure 27. Debug Registers DR3, DR2, DR1, and DRO _ 


Software Developers For additional details on the debug feature, refer to the 
Manual Debugging section in the AMD K86™ Family Software 


Developers Manual, order# 20697. This document will be 
available in June, 1997. 
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AMD-K6™ MMX Processor x86 Architecture Extensions 


This section documents the extensions that have been added to 
the AMD-K6 MMX processor. 


Model-Specific Registers (MSR) 


Machine-Check 
Address Register 
(MCAR) 


Machine-Check Type 
Register (MCTR) 


AMD-K6 MMX Processor 


The AMD-K6 processor provides the following six MSRs. The 
contents of ECX selects the MSR to be addressed by the 
RDMSR and WRMSR instruction. 

m Machine-Check Address Register (MCAR)—ECX = 00h 
Machine-Check Type Register (MCTR)—ECX = 01h 

Test Register 12 (TR12)—ECX = 0Eh 

Time Stamp Counter (TSC)—ECX = 10h 


Extended Feature Enable Register 
(EFER)—ECX = C000_0080h 
m SYSCALL Target Address Register 
(STAR)—ECX = C000_0081h 


m Write Handling Control Register 
(WHCR)—ECX = C000_0082h 


These six MSRs are read and written by the RDMSR and 
WRMSR instructions. (The TSC can also be read by the RDTSC 
instruction.) The target register for the RDMSR and WRMSR 
instructions is addressed by the contents of ECX. The only 
values allowed in ECX by the AMD-K6 processor are 00h, 01h, 
OEh, 10h, CO00_0080h, CO00_0081h, and C000_0082h for the 
MCAR, MCTR, TR12, TSC, EFER, STAR and WHCR registers 
respectively. The usage of any other reserved value in ECX 
results in a general protection exception. 


See Figure 20 on page 80 and “Machine Check Exception” on 
page 121. 


See Figure 21 on page 81 and “Machine Check Exception” on 
page 121. 
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Test Register 12 The AMD-K6 MMX processor also provides the 64-bit Test 
(TR12) Register 12 (TR12), but only the function of the Cache Inhibit 


(CI) bit (bit 3 of TR12) is supported. All other bits in TR12 have 
no effect on the processor’s operation. The I/O Trap Restart 
function (bit 9 of TR12) is always enabled on the AMD-K6. 


Time Stamp Counter See “Time Stamp Counter (TSC)” on page 81. 
(TSC) 


Extended Feature The Extended Feature Enable Register (EFER) contains the 
Enable Register control bits that enable the extended features of the AMD-K6 
(EFER) processor. Figure 28 shows the format of the EFER register, 


and Table 40 defines the function of each bit of the EFER 
register. The EFER register is MSR C000_0080h. 





—»> Reserved 
Symbol Description Bit 
SCE System Call Extension 0 


Figure 28. Extended Feature Enable Register (EFER) 


Table 40. Extended Feature Enable Register (EFER) Definition 
iS ne ——_ en | A —— SNE 
Racarved Writing a 1 to any reserved bit causes a general protection 
fault to occur. All reserved bits are always read as 0. 
SCE must be set to 1 to enable the usage of the SYSCALL and 
= System Call Extension (SCE) co SYSRET instructions. 









SYSCALL Target The SYSCALL Target Address Register (STAR) contains the 
Address Register target EIP address used by the SYSCALL instruction, and 
(STAR) contains the 16-bit selector base used by the SYSCALL and 


SYSRET instructions. Figure 29 shows the format of the STAR 
register, and Table 41 defines the fields of the STAR register. 
The STAR register is MSR CO00_0081h. 
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CS Selector and SS Selector Target EIP Address 


Base 





me — Reserved 


Figure 29. SYSCALL Target Address Register (STAR) 


Par 41. SYSCALL Target Address —" aaa eager eae Definition 





} Description = 


Target EIP Address a address is et — ae into the EIP and points to the new 
starting address. 


During the SYSCALL instruction, this field is copied into the 
CS register and the contents of this field, plus 8, are copied 
into the SS register. During the SYSRET instruction, this field, 
plus 16, is copied into the SS register, and bits 1-0 of the SS 
register are set to 11b. 


7 Writing a 1 to any reserved bit causes a general protection 
pote | Resned fault to occur. All reserved bits are always read as 0. 
















47-32 CS and SS Selector Base 





Write Handling The AMD-K6 MMX processor contains a split level-1 (L1) 
Control Register 64-Kbyte writeback cache organized as a separate 32-Kbyte 
(WHCR) instruction cache and a 32-Kbyte data cache with two-way set 


associativity. The cache line size is 32 bytes, and lines are read 
from memory using an efficient pipelined burst read cycle. 
Further performance gains are achieved by the 
implementation of a write allocation scheme. 


For more information about write allocate, see the 
Implementation of Write Allocate in the K86™ Processors 
Application Note, order# 21326. 


Write allocate, if enabled, occurs when the processor has a 
pending memory write cycle to a cacheable line and the line 
does not currently reside in the L1 cache. In this case, the 
processor performs a burst read cycle to fetch the cache line 
addressed by the pending write cycle. The data associated with 
the pending write cycle is merged with the recently-allocated 
cache line and stored in the processor’s L1 cache in the 
modified state. The cache line must be marked as modified 
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because the pending write cycle is not performed on the 
processor’s external bus. 


Write Handling Control Register (WHCR). The Write Handling Control 
Register (WHCR) is an MSR that contains three fields—the 
WCDE bit, the Write Allocate Enable Limit (WAELIM) field, 
and the Write Allocate Enable 15-to-16-Mbyte (WAE15M) bit 
(See Figure 30). 


. 8 0 

ea Pag a Tia Ae DO BLE Ti a! PS W 
mol ria Pe tee a D WAELIM ] 

a hae 78> a le oe E M 


63 






—» Reserved 


Symbol Description Bits 
WCDE Write Cacheability Detection Enable 8 
WAELIM — Write Allocate Enable Limit 7-1 


WAE15M Write Allocate Enable 15-to-16-Mbyte 0 


Note: Hardware RESET initializes this MSR to all zeros. 


Figure 30. Write Handling Control Register (WHCR)—MSR C000_0082h 
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Write Cacheability Detection Enable. Write Cacheability Detection 
causes a write allocate to occur only if the Write Cacheability 
Detection Enable (WCDE) bit (bit 8) in the Write Handling 
Control Register (WHCR) MSR is set to 1. For more details on 
the Write Cacheability Detection Mechanism, see the Cache 
Organization chapter in the AMD-K6™ MMX Processor Data 
Sheet, order# 20695. 


If the address is cacheable, support of the Write Cacheability 
Detection mechanism requires the system logic to assert KEN 
during a write cycle. Some chipsets assert KEN during a write 
cycle and some chipsets do not assert KEN during a write cycle. 
(Triton chipsets eventually generate a correct value for KEN, 
but not during the sample point. Therefore do not enable 
WCDE in systems that use the Triton chipset.) If Write 
Cacheability Detection is enabled, KEN is sampled during 
write cycles in the same manner it is sampled during read 
cycles (KEN is sampled on the clock edge on which the first 
BRDY or NA of a cycle is sampled asserted). 
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Write Allocate Enable Limit. The WAELIM field is 7 bits wide. This 
field, multiplied by 4 Mbytes, defines an upper memory limit. 
Any pending write cycle that addresses memory below this 
limit causes the processor to perform a write allocate. Write 
allocate is disabled for memory accesses at and above this 
limit unless the processor determines a pending write cycle is 
cacheable by means of one of the other Write Cacheability 
Detection mechanisms. The maximum value of this limit is 
((2’-1) - 4 Mbytes) = 508 Mbytes. When all the bits in this field 
are set to 0, all memory is above this limit and the write 
allocate mechanism is disabled. 


Write Allocate Enable 15-to-16-Mbyte. The WAE15M bit is used to 
enable write allocations for the memory write cycles that 
address the 1 Mbyte of memory between 15 Mbytes and 16 
Mbytes. This bit must be set to 0 to prevent write allocates in 
this memory area. This sub-mechanism of the WAELIM 
provides a memory hole to prevent write allocates. This 
memory hole is provided to account for a small number of 
uncommon memory-mapped I/O adapters that use this 
particular memory address space. If the system contains one of 
these peripherals, the bit should be set to 0. The WAE15M bit is 
ignored if the value in the WAELIM field is set to less than 16 
Mbytes. 


By definition, write allocations in the AMD-K6 processor are 
never performed in the memory area between 640 Kbytes and 1 
Mbyte. It is not safe to perform write allocations between 640 
Kbytes and 1 Mbyte (000A_0000h to 000F_FFFFh) because it is 
considered a non-cacheable region of memory. 


See the Software Environment section of the AMD-K6 MMX 
Processor Data Sheet, order# 20695, for more information. 


Machine Check Exception 


AMD-K6 MMX Processor 


The AMD-K6 processor does not support the generation of a 
machine check exception. 


The processor provides a 64-bit Machine Check Address 
Register (MCAR) and a 64-bit Machine Check Type Register 
(MCTR), but because the processor does not support machine 
check exceptions, the contents of the MCAR and MCTR are 
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only affected by the WRMSR instruction and by RESET being 
sampled asserted (where all bits in each register are reset to 0). 


The processor also provides the Machine Check Exception 
(MCE) bit in Control Register 4 (CR4, bit 6) as a read-write bit. 
However, the state of this bit has no effect on the operation of 
the processor. 


The processor does not provide the BUSCHK and PEN signals 
provided by Pentium. 


New AMD-K6™ MMX Processor Instructions 


This section documents and explains the new instructions 
added to the AMD-K6 processor above and beyond the AMD-K5 
processor. 

ms SYSCALL 

=» SYSRET 


=» MMxX—57 new Multimedia Extension instructions 
See “Multimedia Extensions (MMX)” on page 127. 


System Call Extensions 
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Setting bit 0 (SCE) in the Extended Feature Enable Register 
(See “Extended Feature Enable Register (EFER)” on 
page 118) enables the system call extensions. The system call 
extensions consist of two new instructions, SYSCALL and 
SYSRET, that allow OS vendors fast protection-level switching 
to and from CPLO. 
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SYSCALL 

mnemonic opcode description 

SYSCALL OFO5h — Call operating system 

Privilege: none 

Registers Affected: ECX, EIP, CS, SS 

Flags Affected: IF, VM 

Machine State Affected: CPL, CS (base, limit, attr), SS (base, limit, attr) 


Exceptions Generated: 


Virtual 
8086 | Protected Description 





Invalid opcode (6) X X X The System Call Extension bit (SCE) of the Extended Feature Enable Register 
(EFER) is set to 0. (The EFER register is MSR CO00_0080h.) 
The SYSCALL instruction provides a fast method for transferring control to a fixed 
entry point in an operating system. 


The EIP register is copied into the ECX register. Bits 31-0 of the 64-bit SYSCALL 
Target Address Register (See “SYSCALL Target Address Register (STAR)” on 
page 118) are copied into the EIP register. (The STAR register is Model-Specific 
Register CO000_008 1h.) 


The IF and VM flags are set to 0 to disable interrupts and force the processor out of 
Virtual-8086 mode. 
New selectors are loaded with no checking performed as follows: 


m Bits 47-32 of the STAR register are copied into the CS register 
m (Bits 47-32 of the STAR register) + 8 are copied into the SS register 


The CS and SS registers must not be modified by the operating system between the 
execution of the SYSCALL instruction and its corresponding SYSRET instruction. 


The processor’s CPL is set to 0 regardless of the value of bits 33-32 of the STAR 
register. There are no permission checks of the CPL, Real mode, or Virtual-8086 
mode. 
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The following descriptors are loaded to specify fixed 4-Gbyte flat segments as follows: 
m The CS _base and the SS _base are both set to zero 

= The CS_limit and the SS_limit are both set to 4-Gbyte 

m The CS segment attributes are set to Read-only 

m The SS segment attributes are set to Read-Write and Expand-Up 

The operating system must set the STAR register and the appropriate descriptor 


table entries to reflect the values loaded by the processor during the SYSCALL 
instruction. 


Related Instructions See the SYSRET instruction. 
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SYSRET 

mnemonic opcode description 

SYSRET OFO7h Return from operating system 

Privilege: CPL =0 

Registers Affected: EIP, CS, SS 

Flags Affected: IF 

Machine State Affected: CPL, CS (base, limit, attr) 


Exceptions Generated: 






Virtual 
8086 | Protected Description 
Invalid opcode (6) The System Call Extension bit (SCE) of the Extended Feature Enable Register 
(EFER) is set to 0. (The EFER register is MSR C000_0080h.) 
General protection (13) The CPL is not equal to 0. 


The SYSRET instruction is the return instruction used in conjunction with the 
SYSCALL instruction to provide fast entry/exit to an operating system. 





The ECX register, which points to the next sequential instruction after the 
corresponding SYSCALL instruction, is copied into the EIP register. 


The IF flag is set to 1 in order to enable interrupts. 


New selectors are loaded without any checking as follows: 


= Bits 47-32 of the STAR register are copied into the CS register 


= Bits 1-0 of the CS register are set to 11b (CPL of 3), regardless of the value of bits 
33-32 of the STAR register 


(Bits 47-32 of the STAR register) + 16 are copied into the SS register 


= Bits 1-0 of the SS register are set to 11b (RPL of 3), regardless of the value of bits 
33-32 of the STAR register 


The CS and SS registers must not be modified by the operating system between the 
execution of the SYSCALL instruction and its corresponding SYSRET instruction. 


If the CPL is not equal to 0 when the SYSRET instruction is executed, a general 
protection fault exception is generated with an error code of 0. 
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A new descriptor is loaded for CS to specify a fixed 4-Gbyte flat segment as follows: 


m The CS _base is set to zero 
= The CS_limit is set to 4-Gbyte 
m The CS segment attributes are set to Read-only 


The operating system must set the STAR register and the appropriate descriptor 
table entries to reflect the values loaded by the processor during the SYSCALL 
instruction. 


Related Instructions See the SYSCALL instruction. 
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Multimedia Extensions (MMX) 


The AMD-K6 MMX processor implements the complete MMX 
instruction set. For a detailed description refer to the 
AMD-K6™ MMX Processor Multimedia Extensions (MMX) 
document, order# 20726, located at http://(www.amd.com. Table 
42 lists the MMX instructions. 


Table 42. MMX Instructions and Descriptions 

































HOV, 
PACKSSW /PACKSSDM 
PADDBYPADDWTPADOD 
PADDSBIPADDSW 
PCMPEQBJPCMPEGW/PCMPEGD 
PCMPGTB/PCMPGTW/PCMPGTD 
PMADDWD 
Pa 
PULA 
PLLWYPSLLD/PSLLQ 
PRA PSRAD 
PSRLW/PSRLD/PSRLO 
PSUBB/PSUBW/PSUBD 
PSUBSB/PSUBSW 
PSUBUSB/PSUBST 
PUNPCKHBW/PUNPCKHWD/PUNPCKHDQ 
PUNPCKLBW/PUNPCKLWD/PUNPCKLDQ 
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